Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.12126
Cited By
SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks
18 October 2023
Mohammadreza Salehi
Sachin Mehta
Aditya Kusupati
Ali Farhadi
Hannaneh Hajishirzi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks"
8 / 8 papers shown
Title
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Keivan Alizadeh
Iman Mirzadeh
Hooman Shahrokhi
Dmitry Belenko
Frank Sun
Minsik Cho
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
MoE
29
1
0
01 Oct 2024
HydraViT: Stacking Heads for a Scalable ViT
Janek Haberer
A. Hojjat
Olaf Landsiedel
21
0
0
26 Sep 2024
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking
Chia-Hsuan Lee
Hao Cheng
Mari Ostendorf
38
4
0
16 Nov 2023
MatFormer: Nested Transformer for Elastic Inference
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
32
22
0
11 Oct 2023
Transkimmer: Transformer Learns to Layer-wise Skim
Yue Guan
Zhengyi Li
Jingwen Leng
Zhouhan Lin
Minyi Guo
61
38
0
15 May 2022
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
86
336
0
05 Jan 2021
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
225
574
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1