ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,418 papers shown
Title
JIANG: Chinese Open Foundation Language Model
JIANG: Chinese Open Foundation Language Model
Qinhua Duan
Wenchao Gu
Yujia Chen
Wenxin Mao
Zewen Tian
Huibing Cao
ALM
10
0
0
01 Aug 2023
TransNormerLLM: A Faster and Better Large Language Model with Improved
  TransNormer
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
Zhen Qin
Dong Li
Weigao Sun
Weixuan Sun
Xuyang Shen
...
Yunshen Wei
Baohong Lv
Xiao Luo
Yu Qiao
Yiran Zhong
30
15
0
27 Jul 2023
Multilingual Code Co-Evolution Using Large Language Models
Multilingual Code Co-Evolution Using Large Language Models
Jiyang Zhang
Pengyu Nie
Junyi Jessy Li
Miloš Gligorić
19
20
0
27 Jul 2023
Global k-Space Interpolation for Dynamic MRI Reconstruction using Masked
  Image Modeling
Global k-Space Interpolation for Dynamic MRI Reconstruction using Masked Image Modeling
Jia-Yu Pan
Suprosanna Shit
O. Turgut
Wenqi Huang
Hongwei Bran Li
Nil Stolt Ansó
Thomas Kustner
Kerstin Hammernik
Daniel Rueckert
17
9
0
24 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
30
132
0
20 Jul 2023
FABRIC: Personalizing Diffusion Models with Iterative Feedback
FABRIC: Personalizing Diffusion Models with Iterative Feedback
Dimitri von Rütte
Elisabetta Fedele
Jonathan Thomm
Lukas Wolf
8
10
0
19 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
13
1,103
0
17 Jul 2023
Retentive Network: A Successor to Transformer for Large Language Models
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
34
300
0
17 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
  and Generation
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
16
241
0
13 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
46
499
0
12 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
13
102
0
12 Jul 2023
ReLoRA: High-Rank Training Through Low-Rank Updates
ReLoRA: High-Rank Training Through Low-Rank Updates
Vladislav Lialin
Namrata Shivagunde
Sherin Muckatira
Anna Rumshisky
BDL
21
36
0
11 Jul 2023
Large Language Models as General Pattern Machines
Large Language Models as General Pattern Machines
Suvir Mirchandani
F. Xia
Peter R. Florence
Brian Ichter
Danny Driess
Montse Gonzalez Arenas
Kanishka Rao
Dorsa Sadigh
Andy Zeng
LLMAG
37
183
0
10 Jul 2023
Large Language Models as Batteries-Included Zero-Shot ESCO Skills
  Matchers
Large Language Models as Batteries-Included Zero-Shot ESCO Skills Matchers
Benjamin Clavié
Guillaume Soulié
10
10
0
07 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long Contexts
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Percy Liang
RALM
15
1,380
0
06 Jul 2023
Focused Transformer: Contrastive Training for Context Scaling
Focused Transformer: Contrastive Training for Context Scaling
Szymon Tworkowski
Konrad Staniszewski
Mikolaj Pacek
Yuhuai Wu
Henryk Michalewski
Piotr Milo's
21
133
0
06 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
16
149
0
05 Jul 2023
SMILE: Evaluation and Domain Adaptation for Social Media Language
  Understanding
SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
Vasilisa Bashlovkina
Riley Matthews
Zhaobin Kuang
Simon Baumgartner
Michael Bendersky
25
4
0
30 Jun 2023
RL4CO: an Extensive Reinforcement Learning for Combinatorial
  Optimization Benchmark
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark
Federico Berto
Chuanbo Hua
Junyoung Park
Laurin Luttmann
Yining Ma
...
Guojie Song
Changhyun Kwon
Kevin Tierney
Lin Xie
Jinkyoo Park
OffRL
14
27
0
29 Jun 2023
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention
Ahan Gupta
Hao Guo
Yueming Yuan
Yan-Quan Zhou
Charith Mendis
11
2
0
27 Jun 2023
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
  Resolution
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Eric N. D. Nguyen
Michael Poli
Marjan Faizi
A. Thomas
Callum Birch-Sykes
...
Stefano Massaroli
Yoshua Bengio
Stefano Ermon
S. Baccus
Christopher Ré
MedIm
4
212
0
27 Jun 2023
Extending Context Window of Large Language Models via Positional
  Interpolation
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
10
490
0
27 Jun 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
C. L. P. Chen
Yaochu Jin
Lingjuan Lyu
AIFin
AI4CE
83
84
0
27 Jun 2023
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species
  Genome
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou
Yanrong Ji
Weijian Li
Pratik Dutta
R. Davuluri
Han Liu
4
166
0
26 Jun 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu (Allen) Zhang
Ying Sheng
Tianyi Zhou
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
35
246
0
24 Jun 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language
  Models
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALM
ELM
14
19
0
23 Jun 2023
LightGlue: Local Feature Matching at Light Speed
LightGlue: Local Feature Matching at Light Speed
Philipp Lindenberger
Paul-Edouard Sarlin
Marc Pollefeys
3DV
VLM
12
386
0
23 Jun 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large
  Foundation Models
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Shizhe Diao
Rui Pan
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
12
63
0
21 Jun 2023
Textbooks Are All You Need
Textbooks Are All You Need
Suriya Gunasekar
Yi Zhang
J. Aneja
C. C. T. Mendes
Allison Del Giorno
...
Sébastien Bubeck
Ronen Eldan
Adam Tauman Kalai
Y. Lee
Yuan-Fang Li
AI4CE
ALM
SyDa
10
380
0
20 Jun 2023
Sparse Modular Activation for Efficient Sequence Modeling
Sparse Modular Activation for Efficient Sequence Modeling
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
43
13
0
19 Jun 2023
Anticipatory Music Transformer
Anticipatory Music Transformer
John Thickstun
David Leo Wright Hall
Chris Donahue
Percy Liang
10
14
0
14 Jun 2023
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error
  Correction through Low-Rank Adaptation
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation
Yuji Chai
John Gkountouras
Glenn G. Ko
David Brooks
Gu-Yeon Wei
MQ
22
18
0
13 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric P. Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
11
3,748
0
09 Jun 2023
Simple and Controllable Music Generation
Simple and Controllable Music Generation
Jade Copet
Felix Kreuk
Itai Gat
Tal Remez
David Kant
Gabriel Synnaeve
Yossi Adi
Alexandre Défossez
MGen
19
337
0
08 Jun 2023
VideoComposer: Compositional Video Synthesis with Motion Controllability
VideoComposer: Compositional Video Synthesis with Motion Controllability
Xiang Wang
Hangjie Yuan
Shiwei Zhang
Dayou Chen
Jiuniu Wang
Yingya Zhang
Yujun Shen
Deli Zhao
Jingren Zhou
VGen
DiffM
14
315
0
03 Jun 2023
Faster Causal Attention Over Large Sequences Through Sparse Flash
  Attention
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Matteo Pagliardini
Daniele Paliotta
Martin Jaggi
Franccois Fleuret
LRM
10
22
0
01 Jun 2023
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora
  with Web Data, and Web Data Only
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
10
741
0
01 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
35
156
0
01 Jun 2023
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two
  Seconds
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
Yanyu Li
Huan Wang
Qing Jin
Ju Hu
Pavlo Chemerys
Yun Fu
Yanzhi Wang
Sergey Tulyakov
Jian Ren
VLM
11
149
0
01 Jun 2023
Coneheads: Hierarchy Aware Attention
Coneheads: Hierarchy Aware Attention
Albert Tseng
Tao Yu
Toni J.B. Liu
Chris De Sa
3DPC
4
5
0
01 Jun 2023
Protein Design with Guided Discrete Diffusion
Protein Design with Guided Discrete Diffusion
Nate Gruver
Samuel Stanton
Nathan C. Frey
Tim G. J. Rudner
I. Hotzel
J. Lafrance-Vanasse
A. Rajpal
Kyunghyun Cho
A. Wilson
DiffM
24
100
0
31 May 2023
Self-Verification Improves Few-Shot Clinical Information Extraction
Self-Verification Improves Few-Shot Clinical Information Extraction
Zelalem Gero
Chandan Singh
Hao Cheng
Tristan Naumann
Michel Galley
Jianfeng Gao
Hoifung Poon
32
51
0
30 May 2023
Blockwise Parallel Transformer for Large Context Models
Blockwise Parallel Transformer for Large Context Models
Hao Liu
Pieter Abbeel
26
11
0
30 May 2023
Likelihood-Based Diffusion Language Models
Likelihood-Based Diffusion Language Models
Ishaan Gulrajani
Tatsunori B. Hashimoto
DiffM
13
50
0
30 May 2023
From Zero to Turbulence: Generative Modeling for 3D Flow Simulation
From Zero to Turbulence: Generative Modeling for 3D Flow Simulation
Marten Lienen
David Ludke
Jan Hansen-Palmus
Stephan Günnemann
DiffM
AI4CE
9
23
0
29 May 2023
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using
  Training Dynamics
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics
A. Ardakani
Altan Haan
Shangyin Tan
Doru-Thom Popovici
Alvin Cheung
Costin Iancu
Koushik Sen
14
3
0
29 May 2023
BigTranslate: Augmenting Large Language Models with Multilingual
  Translation Capability over 100 Languages
BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages
Wen Yang
Chong Li
Jiajun Zhang
Chengqing Zong
LRM
12
46
0
29 May 2023
Geometric Algebra Transformer
Geometric Algebra Transformer
Johann Brehmer
P. D. Haan
S. Behrends
Taco S. Cohen
26
11
0
28 May 2023
Exploring the Practicality of Generative Retrieval on Dynamic Corpora
Exploring the Practicality of Generative Retrieval on Dynamic Corpora
Soyoung Yoon
Chaeeun Kim
Hyunji Lee
Joel Jang
Sohee Yang
Minjoon Seo
11
3
0
27 May 2023
Previous
123...2526272829
Next