Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1909.10351
Cited By
v1
v2
v3
v4
v5 (latest)
TinyBERT: Distilling BERT for Natural Language Understanding
Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TinyBERT: Distilling BERT for Natural Language Understanding"
50 / 1,055 papers shown
Title
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
Wei Wei
Songming Zhang
Yunlong Liang
Fandong Meng
Yufeng Chen
Jinan Xu
Jie Zhou
355
0
0
15 Apr 2025
Multi-Sense Embeddings for Language Models and Knowledge Distillation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Qitong Wang
Mohammed J. Zaki
Georgios Kollias
Vasileios Kalantzis
KELM
245
2
0
08 Apr 2025
Saliency-driven Dynamic Token Pruning for Large Language Models
Yao Tao
Yehui Tang
Yun Wang
Mingjian Zhu
Hailin Hu
Yunhe Wang
445
4
0
06 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Information Fusion (Inf. Fusion), 2025
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
395
37
0
03 Apr 2025
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Qi Tao
Yin Jinhua
Cai Dongqi
Xie Yueqi
Wang Huili
...
Zhou Zhili
Wang Shangguang
Lyu Lingjuan
Huang Yongfeng
Lane Nicholas
269
1
0
24 Mar 2025
Efficient Knowledge Distillation via Curriculum Extraction
Shivam Gupta
Sushrut Karmalkar
324
2
0
21 Mar 2025
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Anshumann
Mohd Abbas Zaidi
Akhil Kedia
Jinwoo Ahn
Taehwak Kwon
Kangwook Lee
Haejun Lee
Joohyung Lee
FedML
801
1
0
21 Mar 2025
A Generalist Hanabi Agent
International Conference on Learning Representations (ICLR), 2025
Arjun Vaithilingam Sudhakar
Hadi Nekoei
Mathieu Reymond
Miao Liu
Janarthanan Rajendran
Sarath Chandar
913
1
0
17 Mar 2025
IteRABRe: Iterative Recovery-Aided Block Reduction
Haryo Akbarianto Wibowo
Israfel Salazar
Hideki Tanaka
Masao Utiyama
Alham Fikri Aji
Mary Dabre
262
1
0
08 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
303
9
0
07 Mar 2025
Malware Detection at the Edge with Lightweight LLMs: A Performance Evaluation
ACM Transactions on Internet Technology (TOIT), 2025
Christian Rondanini
B. Carminati
E. Ferrari
Antonio Gaudiano
Ashish Kundu
228
5
0
06 Mar 2025
EPEE: Towards Efficient and Effective Foundation Models in Biomedicine
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Ziqiang Liu
Rui Zhang
237
1
0
03 Mar 2025
FedMentalCare: Towards Privacy-Preserving Fine-Tuned LLMs to Analyze Mental Health Status Using Federated Learning Framework
S M Sarwar
AI4MH
210
5
0
27 Feb 2025
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs
Linyang He
Ercong Nie
Sukru Samet Dindar
Arsalan Firoozi
Adrian Nicolas Florea
...
Haotian Ye
Jonathan R. Brennan
Helmut Schmid
Hinrich Schütze
Nima Mesgarani
290
3
0
27 Feb 2025
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts
Rabindra Lamsal
M. Read
S. Karunasekera
Muhammad Imran
183
0
0
24 Feb 2025
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Gyeongman Kim
Gyouk Chu
Eunho Yang
MoE
273
0
0
18 Feb 2025
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
Bowei He
Lihao Yin
Hui-Ling Zhen
Xiaokun Zhang
Mingxuan Yuan
Chen Ma
387
2
0
18 Feb 2025
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Yao-Ching Yu
Tsun-Han Chiang
Cheng-Wei Tsai
Chien-Ming Huang
Wen-Kwang Tsao
354
11
0
16 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
278
1
0
09 Feb 2025
A Framework for Double-Blind Federated Adaptation of Foundation Models
Nurbek Tastan
Karthik Nandakumar
FedML
267
0
0
03 Feb 2025
Fake News Detection After LLM Laundering: Measurement and Explanation
Rupak Kumar Das
Jonathan Dodge
544
6
0
29 Jan 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Nicolas Boizard
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
430
26
0
28 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
AAAI Conference on Artificial Intelligence (AAAI), 2024
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
365
1
0
28 Jan 2025
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation
Jan Christian Blaise Cruz
Alham Fikri Aji
286
2
0
22 Jan 2025
Quantification of Large Language Model Distillation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Sunbowen Lee
Junting Zhou
Chang Ao
Kaige Li
Xinrun Du
...
Hamid Alinejad-Rokny
Min Yang
Yitao Liang
Zhoufutu Wen
Shiwen Ni
264
0
0
22 Jan 2025
GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation
IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025
Shashikant Ilager
Lukas Florian Briem
Ivona Brandić
230
0
0
19 Jan 2025
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Zhen Li
Yupeng Su
Runming Yang
C. Xie
Xiping Hu
Zhongwei Xie
Ngai Wong
Hongxia Yang
MQ
LRM
678
17
0
06 Jan 2025
Lillama: Large Language Models Compression via Low-Rank Feature Distillation
Yaya Sy
Christophe Cerisara
Irina Illina
MQ
290
0
0
31 Dec 2024
MatchMiner-AI: An Open-Source Solution for Cancer Clinical Trial Matching
Ethan Cerami
Pavel Trukhanov
Morgan A. Paul
Michael J. Hassett
Irbaz B. Riaz
...
Jad El Masri
Alys Malcolm
Tali Mazor
Ethan Cerami
Kenneth L. Kehl
240
6
0
23 Dec 2024
Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance
ACM Symposium on Applied Computing (SAC), 2024
Sukrit Leelaluk
Cheng Tang
Valdemar Švábenský
Atsushi Shimada
233
2
0
19 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
451
5
0
18 Dec 2024
Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Qianren Mao
Weifeng Jiang
Qingbin Liu
Chenghua Lin
Qian Li
Xianqing Wen
Jianxin Li
Jinhu Lu
277
0
0
01 Dec 2024
Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?
Lewen Yang
Xuanyu Zhou
Juao Fan
Xinyi Xie
Shengxin Zhu
AI4CE
270
1
0
27 Nov 2024
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Y. Fu
Yin Yu
Xiaotian Han
Runchao Li
Xianxuan Long
Haotian Yu
Pan Li
SyDa
381
0
0
25 Nov 2024
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Dun Zeng
Zheshun Wu
Shiyu Liu
Yu Pan
Xiaoying Tang
Zenglin Xu
MLT
FedML
501
2
0
25 Nov 2024
Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
Aryan Sajith
Krishna Chaitanya Rao Kathala
238
4
0
24 Nov 2024
Quantifying Knowledge Distillation Using Partial Information Decomposition
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Pasan Dissanayake
Faisal Hamman
Barproda Halder
Ilia Sucholutsky
Qiuyi Zhang
Sanghamitra Dutta
303
6
0
12 Nov 2024
Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation
Neural Information Processing Systems (NeurIPS), 2024
Yu-Liang Zhan
Zhong-Yi Lu
Hao Sun
Ze-Feng Gao
242
2
0
10 Nov 2024
Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment
IEEE Transactions on Artificial Intelligence (IEEE TAI), 2024
Chengting Yu
Fengzhao Zhang
Ruizhe Chen
Zuozhu Liu
Shurun Tan
Er-ping Li
Aili Wang
325
5
0
03 Nov 2024
Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision
ACM Transactions on Embedded Computing Systems (TECS), 2024
Xiangzhong Luo
Di Liu
Hao Kong
Shuo Huai
Hui Chen
Guochu Xiong
Weichen Liu
213
14
0
03 Nov 2024
Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation
medRxiv (medRxiv), 2024
Ahmed Akib Jawad Karim
Kazi Hafiz Md. Asad
Md. Golam Rabiul Alam
AI4MH
237
6
0
30 Oct 2024
KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation
Rambod Azimi
Rishav Rishav
M. Teichmann
Samira Ebrahimi Kahou
ALM
304
3
0
28 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
453
10
0
24 Oct 2024
Pre-training Distillation for Large Language Models: A Design Space Exploration
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hao Peng
Xin Lv
Yushi Bai
Zijun Yao
Jing Zhang
Lei Hou
Juanzi Li
263
8
0
21 Oct 2024
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
Artificial Intelligence Applications and Innovations (AIAI), 2024
Syed Abdul Gaffar Shakhadri
Kruthika KR
Rakshit Aralimatti
VLM
184
3
0
15 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
478
4
0
13 Oct 2024
Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach
Divya J. Bajpai
M. Hanawal
FedML
210
2
0
06 Oct 2024
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks
Yijiong Yu
Ma Xiufa
Fang Jianwei
Zhi-liang Xu
Su Guangyao
...
Zhixiao Qi
Wei Wang
Wen Liu
Ran Chen
Ji Pei
LRM
RALM
328
1
0
06 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
International Conference on Learning Representations (ICLR), 2024
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
378
13
0
02 Oct 2024
FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices
Zhidong Gao
Yu Zhang
Zhenxiao Zhang
Yanmin Gong
Yuanxiong Guo
141
3
0
01 Oct 2024
Previous
1
2
3
4
5
6
...
20
21
22
Next