Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.11197
Cited By
Sparse Modular Activation for Efficient Sequence Modeling
19 June 2023
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sparse Modular Activation for Efficient Sequence Modeling"
8 / 8 papers shown
Title
Language Model Pre-Training with Sparse Latent Typing
Liliang Ren
Zixuan Zhang
H. Wang
Clare R. Voss
Chengxiang Zhai
Heng Ji
24
3
0
23 Oct 2022
Why neural networks find simple solutions: the many regularizers of geometric complexity
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
43
22
0
27 Sep 2022
Liquid Structural State-Space Models
Ramin Hasani
Mathias Lechner
Tsun-Hsuan Wang
Makram Chahine
Alexander Amini
Daniela Rus
AI4TS
71
61
0
26 Sep 2022
Transkimmer: Transformer Learns to Layer-wise Skim
Yue Guan
Zhengyi Li
Jingwen Leng
Zhouhan Lin
Minyi Guo
29
31
0
15 May 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
63
164
0
21 Feb 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
132
203
0
18 Feb 2022
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
69
125
0
17 Sep 2021
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
222
502
0
12 Mar 2020
1