Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.15828
Cited By
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
31 December 2020
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers"
36 / 36 papers shown
Title
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via
α
α
α
-
β
β
β
-Divergence
Guanghui Wang
Zhiyong Yang
Z. Wang
Shi Wang
Qianqian Xu
Q. Huang
37
0
0
07 May 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
63
0
0
26 Apr 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
54
0
0
24 Apr 2025
Exploring and Controlling Diversity in LLM-Agent Conversation
Kuanchao Chu
Yi-Pei Chen
Hideki Nakayama
LLMAG
42
1
0
24 Feb 2025
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Taeyoung Yun
Dinghuai Zhang
Jinkyoo Park
Ling Pan
DiffM
73
2
0
17 Feb 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Nicolas Boizard
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
68
14
0
28 Jan 2025
Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
Rahul Garg
Trilok Padhi
Hemang Jain
Ugur Kursuncu
Ponnurangam Kumaraguru
75
2
0
19 Nov 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
65
5
0
22 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
33
0
0
13 Oct 2024
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
Hojae Lee
Junho Kim
SangKeun Lee
LRM
26
1
0
11 Oct 2024
NV-Retriever: Improving text embedding models with effective hard-negative mining
G. D. S. P. Moreira
Radek Osmulski
Mengyao Xu
Ronay Ak
Benedikt D. Schifferer
Even Oldridge
RALM
41
30
0
22 Jul 2024
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
M. Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
27
37
0
19 Jul 2024
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
K. Enevoldsen
Márton Kardos
Niklas Muennighoff
Kristoffer Laigaard Nielbo
24
9
0
04 Jun 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
53
12
0
28 May 2024
Efficient Models for the Detection of Hate, Abuse and Profanity
Christoph Tillmann
Aashka Trivedi
Bishwaranjan Bhattacharjee
VLM
6
0
0
08 Feb 2024
Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer
Kyuhong Shim
Jinkyu Lee
Simyoung Chang
Kyuwoong Hwang
32
2
0
31 Aug 2023
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
Jiahao Wang
Songyang Zhang
Yong Liu
Taiqiang Wu
Yujiu Yang
Xihui Liu
Kai-xiang Chen
Ping Luo
Dahua Lin
16
20
0
12 Apr 2023
FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain
Yanis Labrak
Adrien Bazoge
Richard Dufour
Mickael Rouvier
Emmanuel Morin
B. Daille
P. Gourraud
17
29
0
09 Apr 2023
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Kyle Lo
Joseph Chee Chang
Andrew Head
Jonathan Bragg
Amy X. Zhang
...
Caroline M Wu
Jiangjiang Yang
Angele Zamarron
Marti A. Hearst
Daniel S. Weld
19
19
0
25 Mar 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Rameswar Panda
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
22
6
0
16 Mar 2023
Smooth and Stepwise Self-Distillation for Object Detection
Jieren Deng
Xiaoxia Zhou
Hao Tian
Zhihong Pan
Derek Aguiar
ObjD
13
0
0
09 Mar 2023
UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
Jon Saad-Falcon
Omar Khattab
Keshav Santhanam
Radu Florian
M. Franz
Salim Roukos
Avirup Sil
Md Arafat Sultan
Christopher Potts
11
41
0
01 Mar 2023
KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer
Kaikai Zhao
Norimichi Ukita
MU
29
1
0
22 Feb 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
16
24
0
19 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
21
6
0
03 Feb 2023
idT5: Indonesian Version of Multilingual T5 Transformer
Mukhlish Fuadi
A. Wibawa
S. Sumpeno
6
6
0
02 Feb 2023
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Quchen Fu
Szu-Wei Fu
Yaran Fan
Yu-Huan Wu
Zhuo Chen
J. Gupchup
Ross Cutler
26
0
0
24 Oct 2022
A Closer Look at Self-Supervised Lightweight Vision Transformers
Shaoru Wang
Jin Gao
Zeming Li
Jian-jun Sun
Weiming Hu
ViT
62
41
0
28 May 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
14
38
0
15 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
16
123
0
14 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
17
9
0
06 Apr 2022
FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing
Ilias Chalkidis
Tommaso Pasini
Shenmin Zhang
Letizia Tomada
Sebastian Felix Schwemer
Anders Søgaard
AILaw
27
54
0
14 Mar 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
31
78
0
23 Dec 2021
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
242
490
0
16 Oct 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1