Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.04037
Cited By
DynaBERT: Dynamic BERT with Adaptive Width and Depth
8 April 2020
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DynaBERT: Dynamic BERT with Adaptive Width and Depth"
50 / 61 papers shown
Title
How to Train Your Metamorphic Deep Neural Network
Thomas Sommariva
Simone Calderara
Angelo Porrello
26
0
0
07 May 2025
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures
Miguel Nogales
Matteo Gambella
Manuel Roveri
56
0
0
29 Apr 2025
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Hao Luo
Yibing Song
Gao Huang
Fan Wang
Yang You
69
0
0
09 Apr 2025
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
Xubin Wang
Zhiqing Tang
Jianxiong Guo
Tianhui Meng
Chenhao Wang
Tian-sheng Wang
Weijia Jia
50
0
0
08 Mar 2025
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
Benyamin Jamialahmadi
Parsa Kavehzadeh
Mehdi Rezagholizadeh
Parsa Farinneya
Hossein Rajabzadeh
A. Jafari
Boxing Chen
Marzieh S. Tahaei
42
0
0
06 Mar 2025
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Tiansheng Wen
Yifei Wang
Zequn Zeng
Zhong Peng
Yudi Su
Xinyang Liu
Bo Chen
Hongwei Liu
Stefanie Jegelka
Chenyu You
CLL
66
2
0
03 Mar 2025
Ten Challenging Problems in Federated Foundation Models
Tao Fan
Hanlin Gu
Xuemei Cao
Chee Seng Chan
Qian Chen
...
Y. Zhang
Xiaojin Zhang
Zhenzhe Zheng
Lixin Fan
Qiang Yang
FedML
83
4
0
14 Feb 2025
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
Danyal Aftab
Steven Davy
ALM
49
0
0
10 Jan 2025
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
45
5
0
07 Oct 2024
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
Alireza Ganjdanesh
Yan Kang
Yuchen Liu
Richard Y. Zhang
Zhe Lin
Heng Huang
DiffM
32
2
0
23 Sep 2024
Membership Inference Attack Against Masked Image Modeling
Z. Li
Xinlei He
Ning Yu
Yang Zhang
42
1
0
13 Aug 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
45
0
0
22 Jul 2024
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Alireza Ganjdanesh
Reza Shirkavand
Shangqian Gao
Heng Huang
DiffM
VLM
53
4
0
17 Jun 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
43
79
0
26 Mar 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
Adaptive Depth Networks with Skippable Sub-Paths
Woochul Kang
28
1
0
27 Dec 2023
Cooperative Learning for Cost-Adaptive Inference
Xingli Fang
Richard M. Bradford
Jung-Eun Kim
29
1
0
13 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
35
31
0
08 Dec 2023
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Samira Abnar
Omid Saremi
Laurent Dinh
Shantel Wilson
Miguel Angel Bautista
...
Vimal Thilak
Etai Littwin
Jiatao Gu
Josh Susskind
Samy Bengio
34
5
0
13 Oct 2023
VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning
Yanan Wang
Donghuo Zeng
Shinya Wada
Satoshi Kurihara
32
6
0
27 Sep 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
30
5
0
07 Aug 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
Pranjal Aggarwal
Aman Madaan
Yiming Yang
Mausam
LRM
28
36
0
19 May 2023
MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model
Xin Yao
Ziqing Yang
Yiming Cui
Shijin Wang
23
3
0
03 Apr 2023
EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms
Shikhar Tuli
N. Jha
36
3
0
24 Mar 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
24
24
0
19 Feb 2023
The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment
Jared Fernandez
Jacob Kahn
Clara Na
Yonatan Bisk
Emma Strubell
FedML
28
10
0
13 Feb 2023
Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Kyuhong Shim
Jungwook Choi
Wonyong Sung
ViT
22
3
0
29 Jan 2023
Improve Text Classification Accuracy with Intent Information
Yifeng Xie
VLM
19
0
0
15 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
Yudong Li
Cheng-An Hou
Jing-xin Zhao
Rong Tian
...
Xingwu Sun
Zhanhui Kang
Xiaoyong Du
Linlin Shen
Kimmo Yan
VLM
33
23
0
13 Dec 2022
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
19
11
0
23 Nov 2022
Feature-augmented Machine Reading Comprehension with Auxiliary Tasks
Yifeng Xie
26
0
0
17 Nov 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
36
4
0
27 Oct 2022
Efficiently Controlling Multiple Risks with Pareto Testing
Bracha Laufer-Goldshtein
Adam Fisch
Regina Barzilay
Tommi Jaakkola
36
16
0
14 Oct 2022
UFO: Unified Feature Optimization
Teng Xi
Yifan Sun
Deli Yu
Bi Li
Nan Peng
...
Haocheng Feng
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
32
10
0
21 Jul 2022
Play It Cool: Dynamic Shifting Prevents Thermal Throttling
Yangxu Zhou
Feng Liang
Ting-Wu Chin
Diana Marculescu
9
2
0
22 Jun 2022
Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt
Xinyin Ma
Xinchao Wang
Gongfan Fang
Yongliang Shen
Weiming Lu
19
11
0
16 May 2022
Generalized Knowledge Distillation via Relationship Matching
Han-Jia Ye
Su Lu
De-Chuan Zhan
FedML
22
20
0
04 May 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
22
38
0
15 Apr 2022
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
Ziqing Yang
Yiming Cui
Zhigang Chen
SyDa
VLM
20
12
0
30 Mar 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Rui Wang
Qibing Bai
Junyi Ao
Long Zhou
Zhixiang Xiong
Zhihua Wei
Yu Zhang
Tom Ko
Haizhou Li
31
61
0
29 Mar 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
43
333
0
28 Mar 2022
Automated Progressive Learning for Efficient Training of Vision Transformers
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
26
46
0
28 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
27
103
0
21 Mar 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
22
90
0
12 Mar 2022
From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression
Runxin Xu
Fuli Luo
Chengyu Wang
Baobao Chang
Jun Huang
Songfang Huang
Fei Huang
VLM
27
25
0
14 Dec 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
24
117
0
05 Oct 2021
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
M. Lyu
MQ
73
47
0
30 Sep 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
31
22
0
21 Sep 2021
Towards Joint Intent Detection and Slot Filling via Higher-order Attention
Dongsheng Chen
Zhiqi Huang
Xian Wu
Shen Ge
Yuexian Zou
27
20
0
18 Sep 2021
1
2
Next