ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,418 papers shown
Title
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
...
K. Zhang
C. L. P. Chen
Fan Yang
Y. Yang
Lili Qiu
39
29
0
03 Jan 2025
Text2midi: Generating Symbolic Music from Captions
Text2midi: Generating Symbolic Music from Captions
Keshav Bhandari
Abhinaba Roy
Kyra Wang
Geeta Puri
Simon Colton
Dorien Herremans
69
1
0
03 Jan 2025
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu
Peihao Wang
Ruisi Cai
Jason D. Lee
Pan Li
Z. Wang
KELM
36
1
0
03 Jan 2025
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian
Jing Han
Chengcheng Wang
Yuchen Liang
Chao Xu
Hanting Chen
DiffM
21
1
0
03 Jan 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Yineng Zhang
...
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
Luis Ceze
59
19
0
02 Jan 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
106
592
0
31 Dec 2024
TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs
  via Bidirectional Communication
TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication
Zongwu Wang
Fangxin Liu
Mingshuai Li
Li Jiang
LRM
28
0
0
29 Dec 2024
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of
  Adaptive Draft Structures
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures
Situo Zhang
Hankun Wang
Da Ma
Zichen Zhu
Lu Chen
Kunyao Lan
Kai Yu
31
3
0
25 Dec 2024
Tackling the Dynamicity in a Production LLM Serving System with SOTA
  Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient
  Meta-kernels
Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels
Mingcong Song
Xinru Tang
Fengfan Hou
Jing Li
Wei Wei
...
Hongjie Si
D. Jiang
Shouyi Yin
Yang Hu
Guoping Long
36
1
0
24 Dec 2024
PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging
PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging
Mattias Paul Heinrich
3DPC
37
0
0
23 Dec 2024
Attention Entropy is a Key Factor: An Analysis of Parallel Context
  Encoding with Full-attention-based Pre-trained Language Models
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
H. Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
75
2
0
21 Dec 2024
ImagePiece: Content-aware Re-tokenization for Efficient Image
  Recognition
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition
Seungdong Yoa
Seungjun Lee
Hyeseung Cho
Bumsoo Kim
Woohyung Lim
ViT
67
0
0
21 Dec 2024
WebLLM: A High-Performance In-Browser LLM Inference Engine
WebLLM: A High-Performance In-Browser LLM Inference Engine
Charlie F. Ruan
Yucheng Qin
Xun Zhou
Ruihang Lai
Hongyi Jin
...
Yiyan Zhai
Sudeep Agarwal
Hangrui Cao
Siyuan Feng
Tianqi Chen
LRM
77
2
0
20 Dec 2024
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with
  MxDNA
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Xinzhu Ma
Nanqing Dong
W. Ouyang
71
2
0
18 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
  Fast, Memory Efficient, and Long Context Finetuning and Inference
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
88
72
0
18 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
112
1
0
18 Dec 2024
FlexCache: Flexible Approximate Cache System for Video Diffusion
FlexCache: Flexible Approximate Cache System for Video Diffusion
Desen Sun
Henry Tian
Tim Lu
Sihang Liu
DiffM
28
0
0
18 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
78
6
0
17 Dec 2024
Echo: Simulating Distributed Training At Scale
Echo: Simulating Distributed Training At Scale
Yicheng Feng
Yuetao Chen
Kaiwen Chen
Jingzong Li
Tianyuan Wu
Peng Cheng
Chuan Wu
Wei Wang
Tsung-Yi Ho
Hong Xu
68
1
0
17 Dec 2024
ACE-$M^3$: Automatic Capability Evaluator for Multimodal Medical Models
ACE-M3M^3M3: Automatic Capability Evaluator for Multimodal Medical Models
Xiechi Zhang
Shunfan Zheng
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Liang He
ELM
99
0
0
16 Dec 2024
Attention with Dependency Parsing Augmentation for Fine-Grained
  Attribution
Attention with Dependency Parsing Augmentation for Fine-Grained Attribution
Qiang Ding
Lvzhou Luo
Yixuan Cao
Ping Luo
74
0
0
16 Dec 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Zhao Jin
Dacheng Tao
VGen
97
1
0
16 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
70
0
0
13 Dec 2024
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation
  with Linear Computational Complexity
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTD
DiffM
VGen
VLM
81
5
0
13 Dec 2024
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
94
6
0
11 Dec 2024
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Tianwei Yin
Qiang Zhang
Richard Zhang
William T. Freeman
F. Durand
Eli Shechtman
Xun Huang
VGen
DiffM
79
11
0
10 Dec 2024
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long
  Context Extension for Large Language Models
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
Haoran Lian
Junmin Chen
Wei Huang
Yizhe Xiong
Wenping Hu
...
Hui Chen
Jianwei Niu
Zijia Lin
Fuzheng Zhang
Di Zhang
76
0
0
10 Dec 2024
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Baorui Ma
Huachen Gao
Haoge Deng
Zhengxiong Luo
Tiejun Huang
Lulu Tang
Xinlong Wang
DiffM
VGen
105
14
0
09 Dec 2024
Flex Attention: A Programming Model for Generating Optimized Attention
  Kernels
Flex Attention: A Programming Model for Generating Optimized Attention Kernels
Juechu Dong
Boyuan Feng
Driss Guessous
Yanbo Liang
Horace He
61
8
0
07 Dec 2024
MIND: Effective Incorrect Assignment Detection through a Multi-Modal
  Structure-Enhanced Language Model
MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structure-Enhanced Language Model
Yunhe Pang
Bo Chen
Fanjin Zhang
Yanghui Rao
Jie Tang
65
0
0
05 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
91
5
0
04 Dec 2024
Does Few-Shot Learning Help LLM Performance in Code Synthesis?
Does Few-Shot Learning Help LLM Performance in Code Synthesis?
Derek Xu
Tong Xie
Botao Xia
Haoyu Li
Yunsheng Bai
Yizhou Sun
Wei Wang
71
0
0
03 Dec 2024
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang
Aosong Cheng
Ming Lu
Zhiyong Zhuo
Minqi Wang
Jiajun Cao
Shaobo Guo
Qi She
Shanghang Zhang
VLM
88
11
0
02 Dec 2024
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
Y. Wang
Shiju Wang
Shenhan Zhu
Fangcheng Fu
Xinyi Liu
Xuefeng Xiao
Huixia Li
Jiashi Li
Faming Wu
Bin Cui
81
0
0
02 Dec 2024
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu
Hanqing Wang
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
LRM
LM&Ro
79
1
0
02 Dec 2024
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Shuoyi Zhou
Yixuan Zhou
Weiqing Li
Jun Chen
Runchuan Ye
Weihao Wu
Zijian Lin
Shun Lei
Zhiyong Wu
90
1
0
02 Dec 2024
Token Cropr: Faster ViTs for Quite a Few Tasks
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner
C. Lippert
Aravindh Mahendran
ViT
VLM
59
0
0
01 Dec 2024
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
Ali Shiraee Kasmaee
Mohammad Khodadad
Mohammad Arshi Saloot
Nick Sherck
Stephen Dokas
H. Mahyar
Soheila Samiee
ELM
82
0
0
30 Nov 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
70
9
0
29 Nov 2024
Structured Object Language Modeling (SoLM): Native Structured Objects
  Generation Conforming to Complex Schemas with Self-Supervised Denoising
Structured Object Language Modeling (SoLM): Native Structured Objects Generation Conforming to Complex Schemas with Self-Supervised Denoising
A. Tavanaei
Kee Kiat Koo
Hayreddin Ceker
Shaobai Jiang
Qi Li
Julien Han
Karim Bouyarmane
57
0
0
28 Nov 2024
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Feng Liu
Shiwei Zhang
Xiaofeng Wang
Yujie Wei
Haonan Qiu
Yuzhong Zhao
Yingya Zhang
Qixiang Ye
Fang Wan
VGen
AI4TS
93
11
0
28 Nov 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
87
4
0
28 Nov 2024
Any-Resolution AI-Generated Image Detection by Spectral Learning
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou
Symeon Papadopoulos
I. Kompatsiaris
Efstratios Gavves
97
0
0
28 Nov 2024
MiniKV: Pushing the Limits of LLM Inference via 2-Bit
  Layer-Discriminative KV Cache
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Akshat Sharma
Hangliang Ding
Jianping Li
Neel Dani
Minjia Zhang
69
1
0
27 Nov 2024
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen
Xinyin Ma
Gongfan Fang
Xinchao Wang
VLM
87
4
0
26 Nov 2024
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on
  GPUs
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs
Jiahui Liu
Zhenkun Cai
Zhiyong Chen
Minjie Wang
GNN
64
0
0
25 Nov 2024
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM
  Inference Environments
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments
Nikoleta Iliakopoulou
Jovan Stojkovic
Chloe Alverti
Tianyin Xu
Hubertus Franke
Josep Torrellas
70
2
0
24 Nov 2024
Hiding Communication Cost in Distributed LLM Training via Micro-batch
  Co-execution
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
68
1
0
24 Nov 2024
Communication-Efficient Sparsely-Activated Model Training via Sequence
  Migration and Token Condensation
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
Fahao Chen
Peng Li
Zicong Hong
Zhou Su
Song Guo
MoMe
MoE
62
0
0
23 Nov 2024
FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient,
  Fast, and Efficient Transformer Acceleration
FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer Acceleration
Donghyeon Yi
Seoyoung Lee
Jongho Kim
Junyoung Kim
Sohmyung Ha
Ik Joon Chang
Minkyu Je
65
0
0
22 Nov 2024
Previous
123...567...272829
Next