ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,418 papers shown
Title
BitNet: Scaling 1-bit Transformers for Large Language Models
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Huaijie Wang
Lingxiao Ma
Fan Yang
Ruiping Wang
Yi Wu
Furu Wei
MQ
12
95
0
17 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
8
18
0
16 Oct 2023
In-context Pretraining: Language Modeling Beyond Document Boundaries
In-context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi
Sewon Min
Maria Lomeli
Chunting Zhou
Margaret Li
...
Victoria Lin
Noah A. Smith
Luke Zettlemoyer
Scott Yih
Mike Lewis
LRM
RALM
SyDa
19
48
0
16 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
19
45
0
16 Oct 2023
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
Jake Grigsby
Linxi Fan
Yuke Zhu
OffRL
LM&Ro
22
10
0
15 Oct 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language
  Models
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
Saleh Ashkboos
Ilia Markov
Elias Frantar
Tingxuan Zhong
Xincheng Wang
Jie Ren
Torsten Hoefler
Dan Alistarh
MQ
SyDa
117
21
0
13 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
23
0
0
11 Oct 2023
Found in the Middle: Permutation Self-Consistency Improves Listwise
  Ranking in Large Language Models
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
Raphael Tang
Xinyu Crystina Zhang
Xueguang Ma
Jimmy Lin
Ferhan Ture
LRM
29
15
0
11 Oct 2023
MatFormer: Nested Transformer for Elastic Inference
MatFormer: Nested Transformer for Elastic Inference
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
24
22
0
11 Oct 2023
CacheGen: KV Cache Compression and Streaming for Fast Language Model
  Serving
CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
Yuhan Liu
Hanchen Li
Yihua Cheng
Siddhant Ray
Yuyang Huang
...
Ganesh Ananthanarayanan
Michael Maire
Henry Hoffmann
Ari Holtzman
Junchen Jiang
50
41
0
11 Oct 2023
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Eldar Kurtic
Denis Kuznedelev
Elias Frantar
Michael Goin
Dan Alistarh
19
8
0
10 Oct 2023
Mistral 7B
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
23
1,966
0
10 Oct 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
24
454
0
10 Oct 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
  Pruning
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
24
262
0
10 Oct 2023
Latent Diffusion Counterfactual Explanations
Latent Diffusion Counterfactual Explanations
Karim Farid
Simon Schrodi
Max Argus
Thomas Brox
DiffM
33
11
0
10 Oct 2023
iTransformer: Inverted Transformers Are Effective for Time Series
  Forecasting
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
Yong Liu
Tengge Hu
Haoran Zhang
Haixu Wu
Shiyu Wang
Lintao Ma
Mingsheng Long
AI4TS
21
424
0
10 Oct 2023
Humans and language models diverge when predicting repeating text
Humans and language models diverge when predicting repeating text
Aditya R. Vaidya
Javier S. Turek
Alexander G. Huth
17
5
0
10 Oct 2023
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
Peng Di
Jianguo Li
Hang Yu
Wei Jiang
Wenting Cai
...
Zelin Zhao
Xunjin Zheng
Hailian Zhou
Lifu Zhu
Xianying Zhu
ELM
ALM
AI4CE
35
12
0
10 Oct 2023
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
...
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
10
275
0
09 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
12
76
0
09 Oct 2023
Scaling Laws of RoPE-based Extrapolation
Scaling Laws of RoPE-based Extrapolation
Xiaoran Liu
Hang Yan
Shuo Zhang
Chen An
Xipeng Qiu
Dahua Lin
23
80
0
08 Oct 2023
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as
  You May Think -- Introducing AI Detectability Index
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index
Megha Chakraborty
S.M. Towhidul Islam Tonmoy
S. M. Mehedi
Krish Sharma
Niyar R. Barman
...
Tanay Kumar
Vinija Jain
Aman Chadha
Amit P. Sheth
Amitava Das
DeLMO
4
21
0
08 Oct 2023
Walking Down the Memory Maze: Beyond Context Limit through Interactive
  Reading
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Howard Chen
Ramakanth Pasunuru
Jason Weston
Asli Celikyilmaz
RALM
68
72
0
08 Oct 2023
The Troubling Emergence of Hallucination in Large Language Models -- An
  Extensive Definition, Quantification, and Prescriptive Remediations
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
Vipula Rawte
Swagata Chakraborty
Agnibh Pathak
Anubhav Sarkar
S.M. Towhidul Islam Tonmoy
Aman Chadha
Mikel Artetxe
Punit Daniel Simig
HILM
13
114
0
08 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
17
5
0
07 Oct 2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery
  through Sophisticated AI System Technologies
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
S. Song
Bonnie Kruft
Minjia Zhang
Conglong Li
Shiyang Chen
...
Arash Vahdat
Chaowei Xiao
Thomas Gibbs
Anima Anandkumar
R. Stevens
41
13
0
06 Oct 2023
A Comprehensive Performance Study of Large Language Models on Novel AI
  Accelerators
A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators
M. Emani
Sam Foreman
Varuni K. Sastry
Zhen Xie
Siddhisanket Raskar
William Arnold
R. Thakur
V. Vishwanath
M. Papka
ELM
24
9
0
06 Oct 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax
  Attention to Kronecker Computation
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman
Zhao-quan Song
21
31
0
06 Oct 2023
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
  Reasoning
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Ke Wang
Houxing Ren
Aojun Zhou
Zimu Lu
Sichun Luo
Weikang Shi
Renrui Zhang
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLM
LRM
SyDa
22
92
0
05 Oct 2023
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context
  LLMs Training
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Dacheng Li
Rulin Shao
Anze Xie
Eric P. Xing
Xuezhe Ma
Ion Stoica
Joseph E. Gonzalez
Hao Zhang
24
16
0
05 Oct 2023
Retrieval meets Long Context Large Language Models
Retrieval meets Long Context Large Language Models
Peng-Tao Xu
Wei Ping
Xianchao Wu
Lawrence C. McAfee
Chen Zhu
Zihan Liu
Sandeep Subramanian
Evelina Bakhturina
M. Shoeybi
Bryan Catanzaro
RALM
LRM
9
79
0
04 Oct 2023
Never Train from Scratch: Fair Comparison of Long-Sequence Models
  Requires Data-Driven Priors
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Ido Amos
Jonathan Berant
Ankit Gupta
20
24
0
04 Oct 2023
RoFormer for Position Aware Multiple Instance Learning in Whole Slide
  Image Classification
RoFormer for Position Aware Multiple Instance Learning in Whole Slide Image Classification
Etienne Pochet
Rami Maroun
Roger Trullo
MedIm
20
2
0
03 Oct 2023
Ring Attention with Blockwise Transformers for Near-Infinite Context
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu
Matei A. Zaharia
Pieter Abbeel
23
216
0
03 Oct 2023
SEA: Sparse Linear Attention with Estimated Attention Mask
SEA: Sparse Linear Attention with Estimated Attention Mask
Heejun Lee
Jina Kim
Jeffrey Willette
Sung Ju Hwang
16
6
0
03 Oct 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
Praneeth Kacham
Vahab Mirrokni
Peilin Zhong
23
7
0
02 Oct 2023
CAT-LM: Training Language Models on Aligned Code And Tests
CAT-LM: Training Language Models on Aligned Code And Tests
Nikitha Rao
Kush Jain
Uri Alon
Claire Le Goues
Vincent J. Hellendoorn
ALM
25
42
0
02 Oct 2023
GRID: A Platform for General Robot Intelligence Development
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
17
10
0
02 Oct 2023
Learning Type Inference for Enhanced Dataflow Analysis
Learning Type Inference for Enhanced Dataflow Analysis
Lukas Seidel
Sedick Baker Effendi
Xavier Pinho
Konrad Rieck
Brink van der Merwe
Fabian Yamaguchi
17
2
0
01 Oct 2023
GrowLength: Accelerating LLMs Pretraining by Progressively Growing
  Training Length
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Chia-Yuan Chang
Xia Hu
33
11
0
01 Oct 2023
Efficient Streaming Language Models with Attention Sinks
Efficient Streaming Language Models with Attention Sinks
Michel Lang
Yuandong Tian
Beidi Chen
Song Han
Mike Lewis
AI4TS
RALM
16
629
0
29 Sep 2023
GAIA-1: A Generative World Model for Autonomous Driving
GAIA-1: A Generative World Model for Autonomous Driving
Masane Fuchi
Lloyd Russell
Hudson Yeo
Zak Murez
Hiroto Minami
Alex Kendall
Tomohiro Takagi
Gianluca Corrado
VGen
11
215
0
29 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
25
15
0
28 Sep 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
29
1,559
0
28 Sep 2023
AtomSurf : Surface Representation for Learning on Protein Structures
AtomSurf : Surface Representation for Learning on Protein Structures
Vincent Mallet
Souhaib Attaiki
M. Ovsjanikov
32
3
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
24
15
0
28 Sep 2023
Predicting performance difficulty from piano sheet music images
Predicting performance difficulty from piano sheet music images
Yingwei Ma
J. J. Valero-Mas
Yu Jiang
Changjian Wang
6
2
0
28 Sep 2023
Attention Sorting Combats Recency Bias In Long Context Language Models
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRM
RALM
34
40
0
28 Sep 2023
Masked Autoencoders are Scalable Learners of Cellular Morphology
Masked Autoencoders are Scalable Learners of Cellular Morphology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton A. Earnshaw
11
14
0
27 Sep 2023
Effective Long-Context Scaling of Foundation Models
Effective Long-Context Scaling of Foundation Models
Wenhan Xiong
Jingyu Liu
Igor Molybog
Hejia Zhang
Prajjwal Bhargava
...
Dániel Baráth
Sergey Edunov
Mike Lewis
Sinong Wang
Hao Ma
26
202
0
27 Sep 2023
Previous
123...232425...272829
Next