ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,418 papers shown
Title
Token Merging for Fast Stable Diffusion
Token Merging for Fast Stable Diffusion
Daniel Bolya
Judy Hoffman
18
96
0
30 Mar 2023
An Over-parameterized Exponential Regression
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao-quan Song
8
35
0
29 Mar 2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li
Mihir Prabhudesai
Shivam Duggal
Ellis L Brown
Deepak Pathak
DiffM
VLM
19
221
0
28 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
21
459
0
27 Mar 2023
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient
  Vision Transformers
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Cong Wei
Brendan Duke
R. Jiang
P. Aarabi
Graham W. Taylor
Florian Shkurti
ViT
40
13
0
24 Mar 2023
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D
  Object Detection
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
Shihao Wang
Yingfei Liu
Tiancai Wang
Ying Li
Xiangyu Zhang
3DPC
27
188
0
21 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
29
252
0
20 Mar 2023
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language
  Models
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Vithursan Thangarasa
Abhay Gupta
William Marshall
Tianda Li
Kevin Leong
D. DeCoste
Sean Lie
Shreyas Saxena
MoE
AI4CE
6
18
0
18 Mar 2023
Meet in the Middle: A New Pre-training Paradigm
Meet in the Middle: A New Pre-training Paradigm
A. Nguyen
Nikos Karampatziakis
Weizhu Chen
11
19
0
13 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
83
258
0
11 Mar 2023
The style transformer with common knowledge optimization for image-text
  retrieval
The style transformer with common knowledge optimization for image-text retrieval
Wenrui Li
Zhengyu Ma
Jinqiao Shi
Xiaopeng Fan
ViT
25
5
0
01 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
8
12,159
0
27 Feb 2023
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep
  Learning Serving
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Zhuohan Li
Lianmin Zheng
Yinmin Zhong
Vincent Liu
Ying Sheng
...
Yanping Huang
Zhifeng Chen
Hao Zhang
Joseph E. Gonzalez
Ion Stoica
MoE
6
68
0
22 Feb 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
12
276
0
21 Feb 2023
Slapo: A Schedule Language for Progressive Optimization of Large Deep
  Learning Model Training
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen
Cody Hao Yu
Shuai Zheng
Zhen Zhang
Zhiru Zhang
Yida Wang
14
3
0
16 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
11
51
0
13 Feb 2023
A Unified View of Long-Sequence Models towards Modeling Million-Scale
  Dependencies
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
Hongyu Hè
Marko Kabić
18
2
0
13 Feb 2023
In-Context Learning with Many Demonstration Examples
In-Context Learning with Many Demonstration Examples
Mukai Li
Shansan Gong
Jiangtao Feng
Yiheng Xu
Jinchao Zhang
Zhiyong Wu
Lingpeng Kong
32
26
0
09 Feb 2023
Efficient Attention via Control Variates
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
11
18
0
09 Feb 2023
Q-Diffusion: Quantizing Diffusion Models
Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li
Yijia Liu
Long Lian
Hua Yang
Zhen Dong
Daniel Kang
Shanghang Zhang
Kurt Keutzer
DiffM
MQ
21
150
0
08 Feb 2023
Regulating ChatGPT and other Large Generative AI Models
Regulating ChatGPT and other Large Generative AI Models
P. Hacker
A. Engel
M. Mauer
AILaw
10
217
0
05 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
10
47
0
02 Feb 2023
Alternating Updates for Efficient Transformers
Alternating Updates for Efficient Transformers
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
MoE
19
2
0
30 Jan 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup,
  Composability, and Failure Cases
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
6
19
0
27 Jan 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory
  Systems
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
11
3
0
23 Jan 2023
FlatFormer: Flattened Window Attention for Efficient Point Cloud
  Transformer
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Zhijian Liu
Xinyu Yang
Haotian Tang
Shang Yang
Song Han
11
49
0
20 Jan 2023
Does compressing activations help model parallel training?
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
8
3
0
06 Jan 2023
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
Junjie Yan
Yingfei Liu
Jian‐Yuan Sun
Fan Jia
Shuailin Li
Tiancai Wang
Xiangyu Zhang
ViT
3DPC
13
25
0
03 Jan 2023
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement
  Understanding
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
Steven H. Wang
Antoine Scardigli
Leonard Tang
Wei Chen
D.M. Levkin
Anya Chen
Spencer Ball
Thomas Woodside
Oliver Zhang
Dan Hendrycks
AILaw
ELM
16
16
0
02 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
22
83
0
28 Dec 2022
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
20
238
0
28 Dec 2022
Pretraining Without Attention
Pretraining Without Attention
Junxiong Wang
J. Yan
Albert Gu
Alexander M. Rush
17
48
0
20 Dec 2022
FiDO: Fusion-in-Decoder optimized for stronger performance and faster
  inference
FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference
Michiel de Jong
Yury Zemlyanskiy
Joshua Ainslie
Nicholas FitzGerald
Sumit Sanghai
Fei Sha
William W. Cohen
VLM
13
32
0
15 Dec 2022
Elixir: Train a Large Language Model on a Small GPU Cluster
Elixir: Train a Large Language Model on a Small GPU Cluster
Haichen Huang
Jiarui Fang
Hongxin Liu
Shenggui Li
Yang You
VLM
11
7
0
10 Dec 2022
Simplifying and Understanding State Space Models with Diagonal Linear
  RNNs
Simplifying and Understanding State Space Models with Diagonal Linear RNNs
Ankit Gupta
Harsh Mehta
Jonathan Berant
19
21
0
01 Dec 2022
A Self-Attention Ansatz for Ab-initio Quantum Chemistry
A Self-Attention Ansatz for Ab-initio Quantum Chemistry
Ingrid von Glehn
J. Spencer
David Pfau
11
60
0
24 Nov 2022
Modeling Multivariate Biosignals With Graph Neural Networks and
  Structured State Space Models
Modeling Multivariate Biosignals With Graph Neural Networks and Structured State Space Models
Siyi Tang
Jared A. Dunnmon
Liangqiong Qu
Khaled Kamal Saab
T. Baykaner
Christopher Lee-Messer
D. Rubin
14
21
0
21 Nov 2022
Breadth-First Pipeline Parallelism
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
14
1
0
11 Nov 2022
Efficiently Scaling Transformer Inference
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
21
290
0
09 Nov 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
  Transformers
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
12
869
0
31 Oct 2022
Inference from Real-World Sparse Measurements
Inference from Real-World Sparse Measurements
Arnaud Pannatier
Kyle Matoba
F. Fleuret
AI4TS
12
0
0
20 Oct 2022
FIMP: Foundation Model-Informed Message Passing for Graph Neural
  Networks
FIMP: Foundation Model-Informed Message Passing for Graph Neural Networks
S. Rizvi
Nazreen Pallikkavaliyaveetil
David Zhang
Zhuoyang Lyu
Nhi Nguyen
...
Amin Karbasi
Rex Ying
Maria Brbić
Rahul M. Dhodapkar
David van Dijk
GNN
AI4CE
13
1
0
17 Oct 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
14
415
0
17 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
37
9
0
14 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
11
31
0
13 Oct 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State
  Spaces
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces
Eric N. D. Nguyen
Karan Goel
Albert Gu
Gordon W. Downs
Preey Shah
Tri Dao
S. Baccus
Christopher Ré
VLM
8
24
0
12 Oct 2022
ByteTransformer: A High-Performance Transformer Boosted for
  Variable-Length Inputs
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Yujia Zhai
Chengquan Jiang
Leyuan Wang
Xiaoying Jia
Shang Zhang
Zizhong Chen
Xin Liu
Yibo Zhu
44
42
0
06 Oct 2022
Dilated Neighborhood Attention Transformer
Dilated Neighborhood Attention Transformer
Ali Hassani
Humphrey Shi
ViT
MedIm
15
67
0
29 Sep 2022
DPNet: Dual-Path Network for Real-time Object Detection with Lightweight
  Attention
DPNet: Dual-Path Network for Real-time Object Detection with Lightweight Attention
Quan Zhou
Huiming Shi
Wei Xiang
Bin Kang
Xiaofu Wu
Longin Jan Latecki
ObjD
4
31
0
28 Sep 2022
Hydra Attention: Efficient Attention with Many Heads
Hydra Attention: Efficient Attention with Many Heads
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Judy Hoffman
96
75
0
15 Sep 2022
Previous
123...272829
Next