Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,418 papers shown
Title
Understanding and Optimizing Multi-Stage AI Inference Pipelines
A. Bambhaniya
Hanjiang Wu
Suvinay Subramanian
S. Srinivasan
Souvik Kundu
Amir Yazdanbakhsh
Midhilesh Elavazhagan
Madhu Kumar
Tushar Krishna
37
0
0
14 Apr 2025
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Abhinav Joshi
A. Ahmad
Divyaksh Shukla
Ashutosh Modi
ReLM
LRM
29
0
0
14 Apr 2025
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
Théo Gigant
Camille Guinaudeau
Frédéric Dufaux
21
0
0
14 Apr 2025
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
Wan Borui
Zhao Juntao
Jiang Chenyu
Guo Chuanxiong
Wu Chuan
VLM
41
1
0
13 Apr 2025
FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment
Sijing Wu
Yunhao Li
Ziwen Xu
Yixuan Gao
Huiyu Duan
Wei Sun
Guangtao Zhai
33
1
0
12 Apr 2025
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
Yichao Yuan
Lin Ma
Nishil Talati
MoE
60
0
0
12 Apr 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Wissam Antoun
B. Sagot
Djamé Seddah
MQ
28
0
0
11 Apr 2025
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
M. Dhouib
Davide Buscaldi
Sonia Vanier
A. Shabou
VLM
36
0
0
11 Apr 2025
Particle Hit Clustering and Identification Using Point Set Transformers in Liquid Argon Time Projection Chambers
Edgar E. Robles
A. Yankelevich
Wenjie Wu
J. Bian
Pierre Baldi
28
0
0
11 Apr 2025
Token Level Routing Inference System for Edge Devices
Jianshu She
Wenhao Zheng
Zhengzhong Liu
Hongyi Wang
Eric P. Xing
Huaxiu Yao
Qirong Ho
36
0
0
10 Apr 2025
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
Shihong Gao
X. Zhang
Yanyan Shen
Lei Chen
19
1
0
10 Apr 2025
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Z. Chen
Zongyu Lin
MLLM
VLM
MoE
103
0
0
10 Apr 2025
Crafting Query-Aware Selective Attention for Single Image Super-Resolution
Junyoung Kim
Youngrok Kim
Siyeol Jung
Donghyun Min
30
0
0
09 Apr 2025
Distilling Textual Priors from LLM to Efficient Image Fusion
Ran Zhang
Xuanhua He
Ke Cao
L. Liu
Li Zhang
Man Zhou
Jie Zhang
18
0
0
09 Apr 2025
CHIME: A Compressive Framework for Holistic Interest Modeling
Yong Bai
Rui Xiang
Kaiyuan Li
Yongxiang Tang
Yanhua Cheng
Xialong Liu
Peng Jiang
Kun Gai
22
0
0
09 Apr 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
21
0
0
09 Apr 2025
High-Resource Translation:Turning Abundance into Accessibility
Abhiram Reddy Yanampally
14
0
0
08 Apr 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
K. Zhang
Jinahua Han
Lanqing Hong
Hang Xu
X. Li
MLLM
VLM
71
0
0
08 Apr 2025
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
Igor Polyakov
Alexey Dukhanov
Egor Spirin
30
0
0
08 Apr 2025
SPIRe: Boosting LLM Inference Throughput with Speculative Decoding
Sanjit Neelam
Daniel Heinlein
Vaclav Cvicek
Akshay Mishra
Reiner Pope
LRM
33
0
0
08 Apr 2025
Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching
Yanhao Dong
Yubo Miao
Weinan Li
Xiao Zheng
Chao Wang
Feng Lyu
21
0
0
08 Apr 2025
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
Yanbiao Liang
Huihong Shi
Haikuo Shao
Zhongfeng Wang
10
0
0
07 Apr 2025
One-Minute Video Generation with Test-Time Training
Karan Dalal
Daniel Koceja
Gashon Hussein
Jiarui Xu
Yue Zhao
...
Tatsunori Hashimoto
Sanmi Koyejo
Yejin Choi
Yu Sun
Xiaolong Wang
ViT
80
3
0
07 Apr 2025
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
19
0
0
05 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Yifan Li
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
39
0
0
05 Apr 2025
Reasoning on Multiple Needles In A Haystack
Yidong Wang
LRM
28
0
0
05 Apr 2025
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
Yongji Wu
Xueshen Liu
Shuowei Jin
Ceyu Xu
Feng Qian
Ziming Mao
Matthew Lentz
Danyang Zhuo
Ion Stoica
MoMe
MoE
54
0
0
04 Apr 2025
Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
Xin Jin
Simon Niklaus
Zhoutong Zhang
Zhihao Xia
Chunle Guo
Yuting Yang
J. Chen
Chongyi Li
VGen
38
0
0
04 Apr 2025
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Simon A. Lee
Anthony Wu
Jeffrey N. Chiang
MedIm
39
3
0
04 Apr 2025
A Framework for Robust Cognitive Evaluation of LLMs
Karin de Langis
J. Park
Bin Hu
Khanh Chi Le
Andreas Schramm
Michael C. Mensink
Andrew Elfenbein
Dongyeop Kang
26
0
0
03 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
64
0
0
03 Apr 2025
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention
Huangliang Dai
Shixun Wu
Hairui Zhao
Jiajun Huang
Zizhe Jian
Yue Zhu
Haiyang Hu
Zizhong Chen
41
0
0
03 Apr 2025
Urban Computing in the Era of Large Language Models
Zhonghang Li
Lianghao Xia
Xubin Ren
J. Tang
Tianyi Chen
Yong-mei Xu
C. Huang
70
0
0
02 Apr 2025
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Runlong Zhou
Yi Zhang
RALM
48
0
0
02 Apr 2025
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Sakhinana Sagar Srinivas
Venkataramana Runkana
OffRL
43
1
0
02 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
50
0
0
02 Apr 2025
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Nicola Muca Cirone
C. Salvi
38
1
0
01 Apr 2025
TransMamba: Flexibly Switching between Transformer and Mamba
Yixing Li
Ruobing Xie
Zhen Yang
X. Sun
Shuaipeng Li
...
Zhanhui Kang
Yu Cheng
C. Xu
Di Wang
Jie Jiang
Mamba
57
1
0
31 Mar 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
J. Wang
Tao Dai
Shu-Tao Xia
Luca Benini
58
1
0
30 Mar 2025
Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models
Irtaza Khalid
Amir Masoud Nourollah
Steven Schockaert
LRM
34
0
0
30 Mar 2025
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Wei Tao
Bin Zhang
Xiaoyang Qu
Jiguang Wan
Jianzong Wang
29
1
0
30 Mar 2025
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
X. Wang
Linrui Ma
Jerry Huang
Peng Lu
Prasanna Parthasarathi
Xiao-Wen Chang
Boxing Chen
Yufei Cui
KELM
39
1
0
28 Mar 2025
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
Dongchen Lu
Yuyao Sun
Zilu Zhang
Leping Huang
Jianliang Zeng
Mao Shu
Huo Cao
39
0
0
27 Mar 2025
A Multi-Modal Knowledge-Enhanced Framework for Vessel Trajectory Prediction
Haomin Yu
Tianyi Li
Kristian Torp
Christian S. Jensen
36
0
0
27 Mar 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian-jun Sun
Wei Ma
58
1
0
27 Mar 2025
UniEDU: A Unified Language and Vision Assistant for Education Applications
Zhendong Chu
Jian Xie
Shen Wang
Z. Wang
Qingsong Wen
AI4Ed
105
0
0
26 Mar 2025
Named Entity Recognition in Context
Colin Brisson
Ayoub Kahfy
Marc Bui
Frédéric Constant
44
0
0
26 Mar 2025
Inductive Link Prediction on N-ary Relational Facts via Semantic Hypergraph Reasoning
Gongzhu Yin
H. Zhang
Yuchen Yang
Y. Luo
LRM
78
0
0
26 Mar 2025
GIViC: Generative Implicit Video Compression
Ge Gao
Siyue Teng
Tianhao Peng
Fan Zhang
David Bull
DiffM
VGen
33
0
0
25 Mar 2025
Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder
Changye Li
Weizhe Xu
Serguei V. S. Pakhomov
Ellen Bradley
Dror Ben-Zeev
T. Cohen
39
0
0
25 Mar 2025
Previous
1
2
3
4
5
...
27
28
29
Next