Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.10189
Cited By
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
19 August 2024
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models"
15 / 15 papers shown
Title
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu (Allen) Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
Z. Wang
Steven Li
49
0
0
28 Apr 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
81
0
0
22 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Patrick Haller
Jonas Golde
Alan Akbik
14
0
0
19 Apr 2025
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang
Wen-Ding Li
Daniele Paliotta
Daniel Ritter
Alexander M. Rush
Tri Dao
LRM
21
0
0
14 Apr 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
26
0
0
30 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
76
11
0
27 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
53
0
0
17 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
41
0
0
14 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu-Xi Cheng
64
0
0
03 Mar 2025
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Daniele Paliotta
Junxiong Wang
Matteo Pagliardini
Kevin Y. Li
Aviv Bick
J. Zico Kolter
Albert Gu
F. Fleuret
Tri Dao
ReLM
LRM
40
7
0
27 Feb 2025
On Pruning State-Space LLMs
Tamer Ghattas
Michael Hassid
Roy Schwartz
43
1
0
26 Feb 2025
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Xiuwei Chen
Sihao Lin
Xiao Dong
Z. Chen
Meng Cao
J. Han
Hang Xu
Xiaodan Liang
Mamba
49
0
0
24 Feb 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
78
0
0
09 Oct 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang
Songlin Yang
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
54
11
0
11 Sep 2024
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Songlin Yang
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
34
45
0
11 Apr 2024
1