ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.05686
  4. Cited By
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
v1v2 (latest)

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
12 April 2020
Subhabrata Mukherjee
Ahmed Hassan Awadallah
ArXiv (abs)PDFHTML

Papers citing "XtremeDistil: Multi-stage Distillation for Massive Multilingual Models"

45 / 45 papers shown
Knowledge distillation through geometry-aware representational alignment
Knowledge distillation through geometry-aware representational alignment
Prajjwal Bhattarai
Mohammad Amjad
Dmytro Zhylko
Tuka Alhanai
203
0
0
27 Sep 2025
Merlin: Multi-View Representation Learning for Robust Multivariate Time Series Forecasting with Unfixed Missing Rates
Merlin: Multi-View Representation Learning for Robust Multivariate Time Series Forecasting with Unfixed Missing Rates
Chengqing Yu
Fei Wang
Chuanguang Yang
Zezhi Shao
Tao Sun
Tangwen Qian
Wei Wei
Zhulin An
Yongjun Xu
AI4TS
314
13
0
14 Jun 2025
Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection
Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism DetectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sahrish Khan
Arshad Jhumka
Gabriele Pergola
296
7
0
06 Jun 2025
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
AnchorFormer: Differentiable Anchor Attention for Efficient Vision TransformerPattern Recognition Letters (Pattern Recogn. Lett.), 2025
Jiquan Shan
Junxiao Wang
Lifeng Zhao
Liang Cai
Hongyuan Zhang
Ioannis Liritzis
ViT
860
8
0
22 May 2025
Semantic Retrieval at Walmart
Semantic Retrieval at WalmartKnowledge Discovery and Data Mining (KDD), 2022
Alessandro Magnani
Feng Liu
Suthee Chaidaroon
Sachin Yadav
Praveen Reddy Suram
...
Sijie Chen
Min Xie
Anirudh Kashi
Tony Lee
Ciya Liao
379
25
0
05 Dec 2024
Research on Personalized Compression Algorithm for Pre-trained Models
  Based on Homomorphic Entropy Increase
Research on Personalized Compression Algorithm for Pre-trained Models Based on Homomorphic Entropy Increase
Yicong Li
Xing Guo
Haohua Du
250
1
0
16 Aug 2024
AgentInstruct: Toward Generative Teaching with Agentic Flows
AgentInstruct: Toward Generative Teaching with Agentic Flows
Arindam Mitra
Luciano Del Corro
Guoqing Zheng
Shweti Mahajan
Dany Rouhana
...
Corby Rosset
Fillipe Silva
Hamed Khanpour
Yash Lara
Ahmed Awadallah
SyDa
529
72
0
03 Jul 2024
Survey on Knowledge Distillation for Large Language Models: Methods,
  Evaluation, and Application
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
Chuanpeng Yang
Wang Lu
Yao Zhu
Yidong Wang
Qian Chen
Chenlong Gao
Bingjie Yan
Yiqiang Chen
ALMKELM
311
101
0
02 Jul 2024
Efficiently Distilling LLMs for Edge Applications
Efficiently Distilling LLMs for Edge Applications
Achintya Kundu
Fabian Lim
Aaron Chew
L. Wynter
Penny Chong
Rhui Dih Lee
261
10
0
01 Apr 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
584
73
0
05 Feb 2024
An Empirical Investigation into the Effect of Parameter Choices in
  Knowledge Distillation
An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation
Md Arafat Sultan
Aashka Trivedi
Parul Awasthy
Avirup Sil
289
0
0
12 Jan 2024
From Big to Small Without Losing It All: Text Augmentation with ChatGPT
  for Efficient Sentiment Analysis
From Big to Small Without Losing It All: Text Augmentation with ChatGPT for Efficient Sentiment Analysis
Stanislaw Wo'zniak
Jan Kocoñ
280
16
0
07 Dec 2023
Joint Prompt Optimization of Stacked LLMs using Variational Inference
Joint Prompt Optimization of Stacked LLMs using Variational InferenceNeural Information Processing Systems (NeurIPS), 2023
Alessandro Sordoni
Xingdi Yuan
Marc-Alexandre Côté
Matheus Pereira
Adam Trischler
Ziang Xiao
Arian Hosseini
Friederike Niedtner
Nicolas Le Roux
333
40
0
21 Jun 2023
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Subhabrata Mukherjee
Arindam Mitra
Ganesh Jawahar
Sahaj Agarwal
Hamid Palangi
Ahmed Hassan Awadallah
ELMALMLRM
659
385
0
05 Jun 2023
A Systematic Study of Knowledge Distillation for Natural Language
  Generation with Pseudo-Target Training
A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Nitay Calderon
Subhabrata Mukherjee
Roi Reichart
Amir Kantor
362
24
0
03 May 2023
Distillation of encoder-decoder transformers for sequence labelling
Distillation of encoder-decoder transformers for sequence labellingFindings (Findings), 2023
M. Farina
D. Pappadopulo
Anant Gupta
Leslie Huang
Ozan Irsoy
Thamar Solorio
VLM
406
4
0
10 Feb 2023
Friend-training: Learning from Models of Different but Related Tasks
Friend-training: Learning from Models of Different but Related TasksConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Mian Zhang
Lifeng Jin
Linfeng Song
Haitao Mi
Xiabing Zhou
Dong Yu
VLM
232
0
0
31 Jan 2023
Knowledge Distillation $\approx$ Label Smoothing: Fact or Fallacy?
Knowledge Distillation ≈\approx≈ Label Smoothing: Fact or Fallacy?
Md Arafat Sultan
478
3
0
30 Jan 2023
In-context Learning Distillation: Transferring Few-shot Learning Ability
  of Pre-trained Language Models
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang
Yanda Chen
Zhou Yu
Kathleen McKeown
369
40
0
20 Dec 2022
Efficient Image Captioning for Edge Devices
Efficient Image Captioning for Edge DevicesAAAI Conference on Artificial Intelligence (AAAI), 2022
Ning Wang
Jiangrong Xie
Hangzai Luo
Qinglin Cheng
Jihao Wu
Mingbo Jia
Linlin Li
VLMCLIP
254
40
0
18 Dec 2022
Compressing Cross-Lingual Multi-Task Models at Qualtrics
Compressing Cross-Lingual Multi-Task Models at QualtricsAAAI Conference on Artificial Intelligence (AAAI), 2022
Daniel Fernando Campos
Daniel J. Perry
S. Joshi
Yashmeet Gambhir
Wei Du
Zhengzheng Xing
Aaron Colak
170
1
0
29 Nov 2022
Intriguing Properties of Compression on Multilingual Models
Intriguing Properties of Compression on Multilingual ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Kelechi Ogueji
Orevaoghene Ahia
Gbemileke Onilude
Sebastian Gehrmann
Sara Hooker
Julia Kreutzer
402
15
0
04 Nov 2022
HumSet: Dataset of Multilingual Information Extraction and
  Classification for Humanitarian Crisis Response
HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis ResponseConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Selim Fekih
Nicolò Tamagnone
Benjamin Minixhofer
R. Shrestha
Ximena Contla
Ewan Oglethorpe
Navid Rekabsaz
324
7
0
10 Oct 2022
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation
  Metrics
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation MetricsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniil Larionov
Jens Grunwald
Christoph Leiter
Steffen Eger
261
7
0
20 Sep 2022
No Parameter Left Behind: How Distillation and Model Size Affect
  Zero-Shot Retrieval
No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval
G. Rosa
L. Bonifacio
Vitor Jeronymo
Hugo Queiroz Abonizio
Marzieh Fadaee
R. Lotufo
Rodrigo Nogueira
478
29
0
06 Jun 2022
Differentially Private Model Compression
Differentially Private Model CompressionNeural Information Processing Systems (NeurIPS), 2022
Fatemehsadat Mireshghallah
A. Backurs
Huseyin A. Inan
Lukas Wutschitz
Janardhan Kulkarni
SyDa
242
17
0
03 Jun 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge DistillationInternational Conference on Computational Linguistics (COLING), 2022
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
277
7
0
15 Apr 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an
  Application to Question Answering Systems
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
279
7
0
15 Jan 2022
Learning Cross-Lingual IR from an English Retriever
Learning Cross-Lingual IR from an English Retriever
Yulong Li
M. Franz
Md Arafat Sultan
Bhavani Iyer
Young-Suk Lee
Avirup Sil
VLM
426
34
0
15 Dec 2021
MetaQA: Combining Expert Agents for Multi-Skill Question Answering
MetaQA: Combining Expert Agents for Multi-Skill Question Answering
Haritz Puerto
Gözde Gül Sahin
Iryna Gurevych
LLMAG
505
28
0
03 Dec 2021
Classification-based Quality Estimation: Small and Efficient Models for
  Real-world Applications
Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications
Shuo Sun
Ahmed El-Kishky
Vishrav Chaudhary
James Cross
Francisco Guzmán
Lucia Specia
162
1
0
17 Sep 2021
FLiText: A Faster and Lighter Semi-Supervised Text Classification with
  Convolution Networks
FLiText: A Faster and Lighter Semi-Supervised Text Classification with Convolution Networks
Chen Liu
Mengchao Zhang
Liang Pang
Jiafeng Guo
Xueqi Cheng
CLIP
192
21
0
12 Sep 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
312
26
0
08 Jun 2021
RoSearch: Search for Robust Student Architectures When Distilling
  Pre-trained Language Models
RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models
Xin Guo
Jianlei Yang
Haoyi Zhou
Xucheng Ye
Jianxin Li
183
2
0
07 Jun 2021
AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial
  Discriminator for Cross-Lingual NER
AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial Discriminator for Cross-Lingual NERAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Weile Chen
Huiqiang Jiang
Qianhui Wu
Börje F. Karlsson
Yingjun Guan
333
38
0
04 Jun 2021
LightMBERT: A Simple Yet Effective Method for Multilingual BERT
  Distillation
LightMBERT: A Simple Yet Effective Method for Multilingual BERT Distillation
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
142
10
0
11 Mar 2021
MiniLMv2: Multi-Head Self-Attention Relation Distillation for
  Compressing Pretrained Transformers
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained TransformersFindings (Findings), 2020
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
524
376
0
31 Dec 2020
A Survey on Visual Transformer
A Survey on Visual TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
1.3K
3,405
0
23 Dec 2020
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language modelsInternational Conference on Learning Representations (ICLR), 2020
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
363
174
0
24 Oct 2020
Structural Knowledge Distillation: Tractably Distilling Information for
  Structured Predictor
Structural Knowledge Distillation: Tractably Distilling Information for Structured PredictorAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Xinyu Wang
Yong Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
369
11
0
10 Oct 2020
AIN: Fast and Accurate Sequence Labeling with Approximate Inference
  Network
AIN: Fast and Accurate Sequence Labeling with Approximate Inference NetworkConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Xinyu Wang
Yong Jiang
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
BDL
169
3
0
17 Sep 2020
Compression of Deep Learning Models for Text: A Survey
Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data (TKDD), 2020
Manish Gupta
Puneet Agrawal
VLMMedImAI4CE
696
142
0
12 Aug 2020
Deep Learning Based Text Classification: A Comprehensive Review
Deep Learning Based Text Classification: A Comprehensive ReviewACM Computing Surveys (ACM CSUR), 2020
Shervin Minaee
Nal Kalchbrenner
Xiaoshi Zhong
Narjes Nikzad
M. Asgari-Chenaghlu
Jianfeng Gao
AILawVLMAI4TS
513
1,284
0
06 Apr 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERTTransactions of the Association for Computational Linguistics (TACL), 2020
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
633
213
0
27 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained TransformersNeural Information Processing Systems (NeurIPS), 2020
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
1.4K
2,029
0
25 Feb 2020
1
Page 1 of 1