Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1909.10351
Cited By
v1
v2
v3
v4
v5 (latest)
TinyBERT: Distilling BERT for Natural Language Understanding
Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TinyBERT: Distilling BERT for Natural Language Understanding"
50 / 1,055 papers shown
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Rongzhi Zhang
Jiaming Shen
Tianqi Liu
Haorui Wang
Zhen Qin
Feng Han
Jialu Liu
Simon Baumgartner
Michael Bendersky
Chao Zhang
199
13
0
05 Jun 2024
Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients
Weijun Li
Xingliang Yuan
Mark Dras
PILM
266
4
0
03 Jun 2024
Posterior Label Smoothing for Node Classification
Jaeseung Heo
M. Park
Dongwoo Kim
UQCV
577
0
0
01 Jun 2024
STAT: Shrinking Transformers After Training
Megan Flynn
Alexander Wang
Dean Edward Alvarez
Christopher De Sa
Anil Damle
311
3
0
29 May 2024
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang
Yawei Li
Xinpeng Wang
Qianli Shen
Barbara Plank
Bernd Bischl
Mina Rezaei
Kenji Kawaguchi
249
20
0
28 May 2024
Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques
Siva Rajesh Kasa
Aniket Goel
Karan Gupta
Sumegh Roychowdhury
Anish Bhanushali
Nikhil Pattisapu
Prasanna Srinivasa Murthy
255
6
0
20 May 2024
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks
Taiyuan Mei
Yun Zi
X. Cheng
Zijun Gao
Qi Wang
Haowei Yang
248
26
0
20 May 2024
Feature-Adaptive and Data-Scalable In-Context Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Jiahao Li
Quan Wang
Li Zhang
Guoqing Jin
Zhendong Mao
275
4
0
17 May 2024
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Vasu Sharma
OffRL
350
9
0
15 May 2024
Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph
Zhiwei Wang
Jun Huang
Longhua Ma
Chengyu Wu
Hongyu Ma
286
0
0
14 May 2024
ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis
Mohammad Amaz Uddin
Muhammad Nazrul Islam
Leandros A. Maglaras
Helge Janicke
Iqbal H. Sarker
173
13
0
12 May 2024
A Scene-aware Models Adaptation Scheme for Cross-scene Online Inference on Mobile Devices
IEEE International Conference on Distributed Computing Systems (ICDCS), 2024
Yunzhe Li
Hongzi Zhu
Zhuohong Deng
Yunlong Cheng
Liang Zhang
Shan Chang
Minyi Guo
Minyi Guo
264
4
0
09 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGen
LM&Ro
365
84
0
06 May 2024
Structural Pruning of Pre-trained Language Models via Neural Architecture Search
Aaron Klein
Jacek Golebiowski
Xingchen Ma
Valerio Perrone
Cédric Archambeau
209
5
0
03 May 2024
UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation
Juhwan Choi
Yeonghwa Kim
Seunguk Yu
Jungmin Yun
Youngbin Kim
216
8
0
02 May 2024
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
Minh Duc Bui
Fabian David Schmidt
Goran Glavaš
Katharina von der Wense
189
1
0
30 Apr 2024
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
189
1
0
30 Apr 2024
Annotator-Centric Active Learning for Subjective NLP Tasks
Michiel van der Meer
Neele Falk
P. Murukannaiah
Enrico Liscio
515
17
0
24 Apr 2024
Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation
Lasal Jayawardena
Prasan Yapa
BDL
266
3
0
19 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
291
6
0
18 Apr 2024
ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation
Divyang Doshi
Jung-Eun Kim
207
2
0
15 Apr 2024
MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
Yuxuan Jiang
Chen Feng
Fan Zhang
David Bull
SupR
273
24
0
15 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
287
15
0
13 Apr 2024
Constrained C-Test Generation via Mixed-Integer Programming
Ji-Ung Lee
Marc E. Pfetsch
Iryna Gurevych
181
0
0
12 Apr 2024
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Longwei Zou
Qingyang Wang
Han Zhao
Tingfeng Liu
Yi Yang
Yangdong Deng
248
1
0
10 Apr 2024
What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models
Busayo Awobade
Mardiyyah Oduwole
Steven Kolawole
205
1
0
06 Apr 2024
Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Abhijnan Nath
Shadi Manafi
Avyakta Chelle
Nikhil Krishnaswamy
236
5
0
04 Apr 2024
Efficiently Distilling LLMs for Edge Applications
Achintya Kundu
Fabian Lim
Aaron Chew
L. Wynter
Penny Chong
Rhui Dih Lee
223
10
0
01 Apr 2024
A Comprehensive Study on NLP Data Augmentation for Hate Speech Detection: Legacy Methods, BERT, and LLMs
Md Saroar Jahan
Mourad Oussalah
D. Beddiar
Jhuma Kabir Mim
Nabil Arhab
189
16
0
30 Mar 2024
Are Compressed Language Models Less Subgroup Robust?
Leonidas Gee
Andrea Zugarini
Novi Quadrianto
187
2
0
26 Mar 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
434
158
0
26 Mar 2024
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Computer Vision and Pattern Recognition (CVPR), 2024
Jianqing Zhang
Yang Liu
Yang Hua
Jian Cao
254
20
0
23 Mar 2024
Evaluating Unsupervised Dimensionality Reduction Methods for Pretrained Sentence Embeddings
Gaifan Zhang
Yi Zhou
Danushka Bollegala
221
11
0
20 Mar 2024
Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models
Adian Liusie
Yassir Fathullah
Mark Gales
110
7
0
20 Mar 2024
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Pengcheng Jiang
Cao Xiao
Zifeng Wang
Parminder Bhatia
Jimeng Sun
Jiawei Han
LRM
232
16
0
15 Mar 2024
FBPT: A Fully Binary Point Transformer
IEEE International Conference on Robotics and Automation (ICRA), 2024
Zhixing Hou
Yuzhang Shang
Yan Yan
MQ
233
1
0
15 Mar 2024
Measuring Bias in a Ranked List using Term-based Representations
European Conference on Information Retrieval (ECIR), 2024
Amin Abolghasemi
Leif Azzopardi
Arian Askari
Maarten de Rijke
Suzan Verberne
195
9
0
09 Mar 2024
Learning to Maximize Mutual Information for Chain-of-Thought Distillation
Xin Chen
Hanxian Huang
Yanjun Gao
Yi Wang
Jishen Zhao
Ke Ding
361
27
0
05 Mar 2024
Improving the Downstream Performance of Mixture-of-Experts Transformers via Weak Vanilla Transformers
Xin Lu
Yanyan Zhao
Bing Qin
Ting Liu
MoE
118
1
0
04 Mar 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
336
4
0
03 Mar 2024
Differentially Private Knowledge Distillation via Synthetic Text Generation
James Flemings
Murali Annavaram
SyDa
423
18
0
01 Mar 2024
Sinkhorn Distance Minimization for Knowledge Distillation
Xiao Cui
Yulei Qin
Yuting Gao
Enwei Zhang
Zihan Xu
Tong Wu
Ke Li
Xing Sun
Wen-gang Zhou
Houqiang Li
214
19
0
27 Feb 2024
Layer-wise Regularized Dropout for Neural Language Models
Shiwen Ni
Min Yang
Ruifeng Xu
Chengming Li
Xiping Hu
126
0
0
26 Feb 2024
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report
Fanqi Wan
Ziyi Yang
Longguang Zhong
Xiaojun Quan
Xinting Huang
Wei Bi
MoMe
518
2
0
25 Feb 2024
C
3
C^3
C
3
: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding
Taixi Lu
Haoyu Wang
Huajie Shao
Jing Gao
Huaxiu Yao
170
0
0
25 Feb 2024
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Zhuofeng Wu
Richard He Bai
Aonan Zhang
Jiatao Gu
V. Vydiswaran
Navdeep Jaitly
Yizhe Zhang
LRM
316
22
0
22 Feb 2024
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
Weilin Zhao
Yuxiang Huang
Xu Han
Wang Xu
Chaojun Xiao
Xinrong Zhang
Yewei Fang
Kaihuo Zhang
Zhiyuan Liu
Maosong Sun
280
23
0
21 Feb 2024
EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization
Zhendong Xiao
Changhao Chen
Shan Yang
Wu Wei
196
4
0
21 Feb 2024
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach
Mohammad Amaz Uddin
Md Mahiuddin
Iqbal H. Sarker
208
41
0
21 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Wanrong Zhu
KELM
VLM
469
238
0
20 Feb 2024
Previous
1
2
3
4
5
6
...
20
21
22
Next
Page 5 of 22
Page
of 22
Go