ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,418 papers shown
Title
Joint Prediction and Denoising for Large-scale Multilingual
  Self-supervised Learning
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
William Chen
Jiatong Shi
Brian Yan
Dan Berrebbi
Wangyou Zhang
Yifan Peng
Xuankai Chang
Soumi Maiti
Shinji Watanabe
24
8
0
26 Sep 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
  Long Sequence Transformer Models
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
17
93
0
25 Sep 2023
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot
  Compression
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Ayush Kaushal
Tejas Vaidhya
Irina Rish
44
14
0
25 Sep 2023
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with
  Large Language Models
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
Kailai Yang
Tianlin Zhang
Zi-Zhou Kuang
Qianqian Xie
Jimin Huang
Sophia Ananiadou
AI4MH
8
47
0
24 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling
  Capacities of Large Language Models
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
15
34
0
23 Sep 2023
AntiBARTy Diffusion for Property Guided Antibody Design
AntiBARTy Diffusion for Property Guided Antibody Design
Jordan Venderley
DiffM
11
1
0
22 Sep 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen
Shengju Qian
Haotian Tang
Xin Lai
Zhijian Liu
Song Han
Jiaya Jia
26
150
0
21 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
28
168
0
20 Sep 2023
The Languini Kitchen: Enabling Language Modelling Research at Different
  Scales of Compute
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
38
9
0
20 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
W. Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric P. Xing
19
44
0
19 Sep 2023
FoleyGen: Visually-Guided Audio Generation
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
14
20
0
19 Sep 2023
PoSE: Efficient Context Window Extension of LLMs via Positional
  Skip-wise Training
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Dawei Zhu
Nan Yang
Liang Wang
Yifan Song
Wenhao Wu
Furu Wei
Sujian Li
55
77
0
19 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
34
678
0
19 Sep 2023
Exploring the impact of low-rank adaptation on the performance,
  efficiency, and regularization of RLHF
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Simeng Sun
Dhawal Gupta
Mohit Iyyer
6
17
0
16 Sep 2023
Enhance audio generation controllability through representation
  similarity regularization
Enhance audio generation controllability through representation similarity regularization
Yangyang Shi
Gaël Le Lan
Varun K. Nagaraja
Zhaoheng Ni
Xinhao Mei
Ernie Chang
Forrest N. Iandola
Yang Liu
Vikas Chandra
23
1
0
15 Sep 2023
Replacing softmax with ReLU in Vision Transformers
Replacing softmax with ReLU in Vision Transformers
Mitchell Wortsman
Jaehoon Lee
Justin Gilmer
Simon Kornblith
ViT
14
29
0
15 Sep 2023
CoCA: Fusing Position Embedding with Collinear Constrained Attention in
  Transformers for Long Context Window Extending
CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
Shiyi Zhu
Jingting Ye
Wei Jiang
Siqiao Xue
Qi Zhang
Yifan Wu
Jianguo Li
27
4
0
15 Sep 2023
Less is More for Long Document Summary Evaluation by LLMs
Less is More for Long Document Summary Evaluation by LLMs
Yunshu Wu
Hayate Iso
Pouya Pezeshkpour
Nikita Bhutani
Estevam R. Hruschka
8
34
0
14 Sep 2023
Improved particle-flow event reconstruction with scalable neural
  networks for current and future particle detectors
Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors
J. Pata
Eric Wulff
Farouk Mokhtar
D. Southwick
Mengke Zhang
M. Girone
Javier Duarte
17
1
0
13 Sep 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
26
1,744
0
12 Sep 2023
CaloClouds II: Ultra-Fast Geometry-Independent Highly-Granular
  Calorimeter Simulation
CaloClouds II: Ultra-Fast Geometry-Independent Highly-Granular Calorimeter Simulation
E. Buhmann
F. Gaede
Gregor Kasieczka
A. Korol
W. Korcari
K. Krüger
Peter McKeown
DiffM
17
23
0
11 Sep 2023
Textbooks Are All You Need II: phi-1.5 technical report
Textbooks Are All You Need II: phi-1.5 technical report
Yuan-Fang Li
Sébastien Bubeck
Ronen Eldan
Allison Del Giorno
Suriya Gunasekar
Yin Tat Lee
ALM
LRM
13
430
0
11 Sep 2023
Evaluating the Deductive Competence of Large Language Models
Evaluating the Deductive Competence of Large Language Models
S. M. Seals
V. Shalin
ELM
ReLM
LRM
11
8
0
11 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language
  Models
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo-Wen Zhang
Xiangxiang Chu
MQ
22
28
0
06 Sep 2023
Music Source Separation with Band-Split RoPE Transformer
Music Source Separation with Band-Split RoPE Transformer
Wei-Tsung Lu
Ju-Chiang Wang
Qiuqiang Kong
Yun-Ning Hung
11
19
0
05 Sep 2023
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style
  Models with Limited Resources
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources
Piotr Nawrot
AI4CE
17
5
0
05 Sep 2023
Publicly Shareable Clinical Large Language Model Built on Synthetic
  Clinical Notes
Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes
Sunjun Kweon
Junu Kim
Jiyoun Kim
Sujeong Im
Eunbyeol Cho
...
Seungjin Baek
Chang Hoon Han
Yoon Bin Jung
Yohan Jo
E. Choi
LM&MA
ELM
15
35
0
01 Sep 2023
PointLLM: Empowering Large Language Models to Understand Point Clouds
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu
Xiaolong Wang
Tai Wang
Yilun Chen
Jiangmiao Pang
Dahua Lin
MLLM
48
146
0
31 Aug 2023
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked
  Prefills
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Amey Agrawal
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
R. Ramjee
AI4TS
LRM
20
85
0
31 Aug 2023
A General-Purpose Self-Supervised Model for Computational Pathology
A General-Purpose Self-Supervised Model for Computational Pathology
Richard J. Chen
Tong Ding
Ming Y. Lu
Drew F. K. Williamson
Guillaume Jaume
...
Judy J. Wang
Walt Williams
L. Le
Georg Gerber
Faisal Mahmood
MedIm
20
42
0
29 Aug 2023
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Qingyue Wang
Y. Fu
Yanan Cao
Zhiliang Tian
Shi Wang
Dacheng Tao
LLMAG
KELM
RALM
47
22
0
29 Aug 2023
Multiscale Contextual Learning for Speech Emotion Recognition in
  Emergency Call Center Conversations
Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations
Théo Deschamps-Berger
L. Lamel
Laurence Devillers
13
2
0
28 Aug 2023
MedAlign: A Clinician-Generated Dataset for Instruction Following with
  Electronic Medical Records
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Scott L. Fleming
Alejandro Lozano
W. Haberkorn
Jenelle A. Jindal
E. Reis
...
Jonathan H. Chen
Keith Morse
Emma Brunskill
Jason Alan Fries
N. Shah
LM&MA
15
52
0
27 Aug 2023
Aligning Language Models with Offline Learning from Human Feedback
Aligning Language Models with Offline Learning from Human Feedback
Jian Hu
Li Tao
J. Yang
Chandler Zhou
ALM
OffRL
11
6
0
23 Aug 2023
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data
  Selection for Instruction Tuning
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
Ming Li
Yong Zhang
Zhitao Li
Jiuhai Chen
Lichang Chen
Ning Cheng
Jianzong Wang
Tianyi Zhou
Jing Xiao
25
168
0
23 Aug 2023
How Much Temporal Long-Term Context is Needed for Action Segmentation?
How Much Temporal Long-Term Context is Needed for Action Segmentation?
Emad Bahrami Rad
Gianpiero Francesca
Juergen Gall
ViT
8
24
0
22 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Fei Wu
Guoyin Wang
LM&MA
16
524
0
21 Aug 2023
LegalBench: A Collaboratively Built Benchmark for Measuring Legal
  Reasoning in Large Language Models
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Neel Guha
Julian Nyarko
Daniel E. Ho
Christopher Ré
Adam Chilton
...
Spencer Williams
Sunny G. Gandhi
Tomer Zur
Varun J. Iyer
Zehua Li
AILaw
LRM
ELM
9
142
0
20 Aug 2023
LMTuner: An user-friendly and highly-integrable Training Framework for
  fine-tuning Large Language Models
LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models
Yixuan Weng
Zhiqi Wang
Huanxuan Liao
Shizhu He
Shengping Liu
Kang Liu
Jun Zhao
18
3
0
20 Aug 2023
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain
  Conversation
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Junru Lu
Siyu An
Mingbao Lin
Gabriele Pergola
Yulan He
Di Yin
Xing Sun
Yunsheng Wu
42
31
0
16 Aug 2023
OctoPack: Instruction Tuning Code Large Language Models
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
52
116
0
14 Aug 2023
Pairing interacting protein sequences using masked language modeling
Pairing interacting protein sequences using masked language modeling
Umberto Lupo
Damiano Sgarbossa
Anne-Florence Bitbol
8
9
0
14 Aug 2023
Large Language Models for Telecom: Forthcoming Impact on the Industry
Large Language Models for Telecom: Forthcoming Impact on the Industry
Ali Maatouk
Nicola Piovesan
Fadhel Ayed
Antonio De Domenico
Merouane Debbah
11
49
0
11 Aug 2023
Encode-Store-Retrieve: Enhancing Memory Augmentation through
  Language-Encoded Egocentric Perception
Encode-Store-Retrieve: Enhancing Memory Augmentation through Language-Encoded Egocentric Perception
Junxiao Shen
John J. Dudley
Per Ola Kristensson
RALM
12
0
0
10 Aug 2023
Accelerating LLM Inference with Staged Speculative Decoding
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
15
98
0
08 Aug 2023
Continual Pre-Training of Large Language Models: How to (re)warm your
  model?
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
KELM
22
98
0
08 Aug 2023
DiffSynth: Latent In-Iteration Deflickering for Realistic Video
  Synthesis
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis
Zhongjie Duan
Lizhou You
Chengyu Wang
Cen Chen
Ziheng Wu
Weining Qian
Jun Huang
DiffM
21
8
0
07 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
18
3
0
07 Aug 2023
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models
  Fine-tuning
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Longteng Zhang
Lin Zhang
S. Shi
X. Chu
Bo-wen Li
AI4CE
11
88
0
07 Aug 2023
Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for
  Super-Resolution
Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution
Yong Liu
Hang Dong
Boyang Liang
Song Liu
Qingji Dong
Kai Chen
Fangmin Chen
Lean Fu
Fei Wang
ViT
6
12
0
05 Aug 2023
Previous
123...242526272829
Next