ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1702.01802
  4. Cited By
Ensemble Distillation for Neural Machine Translation
v1v2 (latest)

Ensemble Distillation for Neural Machine Translation

6 February 2017
Markus Freitag
Yaser Al-Onaizan
B. Sankaran
    FedML
ArXiv (abs)PDFHTML

Papers citing "Ensemble Distillation for Neural Machine Translation"

50 / 51 papers shown
Erasure Coded Neural Network Inference via Fisher Averaging
Erasure Coded Neural Network Inference via Fisher AveragingInternational Symposium on Information Theory (ISIT), 2024
Divyansh Jhunjhunwala
Neharika Jali
Gauri Joshi
Shiqiang Wang
MoMeFedML
272
3
0
02 Sep 2024
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation
Yuqiao Wen
Behzad Shayegh
Chenyang Huang
Yanshuai Cao
Lili Mou
462
8
0
29 Feb 2024
Stolen Subwords: Importance of Vocabularies for Machine Translation
  Model Stealing
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
Vilém Zouhar
AAML
225
0
0
29 Jan 2024
Can a student Large Language Model perform as well as it's teacher?
Can a student Large Language Model perform as well as it's teacher?
Sia Gholami
Marwan Omar
269
20
0
03 Oct 2023
CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models
CUED at ProbSum 2023: Hierarchical Ensemble of Summarization ModelsWorkshop on Biomedical Natural Language Processing (BioNLP), 2023
Potsawee Manakul
Yassir Fathullah
Adian Liusie
Vyas Raina
Vatsal Raina
Mark Gales
168
13
0
08 Jun 2023
Accurate Knowledge Distillation with n-best Reranking
Accurate Knowledge Distillation with n-best RerankingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Hendra Setiawan
501
4
0
20 May 2023
Pseudo-Label Training and Model Inertia in Neural Machine Translation
Pseudo-Label Training and Model Inertia in Neural Machine TranslationInternational Conference on Learning Representations (ICLR), 2023
B. Hsu
Anna Currey
Xing Niu
Maria Nuadejde
Georgiana Dinu
ODL
277
3
0
19 May 2023
Leveraging Synthetic Targets for Machine Translation
Leveraging Synthetic Targets for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
262
2
0
07 May 2023
Heterogeneous-Branch Collaborative Learning for Dialogue Generation
Heterogeneous-Branch Collaborative Learning for Dialogue GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023
Yiwei Li
Shaoxiong Feng
Bin Sun
Kan Li
186
4
0
21 Mar 2023
Continual Knowledge Distillation for Neural Machine Translation
Continual Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yuan Zhang
Peng Li
Maosong Sun
Yang Liu
FedMLCLL
332
7
0
18 Dec 2022
Meta-Ensemble Parameter Learning
Meta-Ensemble Parameter Learning
Zhengcong Fei
Shuman Tian
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
OOD
289
2
0
05 Oct 2022
One Reference Is Not Enough: Diverse Distillation with Reference
  Selection for Non-Autoregressive Translation
One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Chenze Shao
Xuanfu Wu
Yang Feng
180
27
0
28 May 2022
Twist Decoding: Diverse Generators Guide Each Other
Twist Decoding: Diverse Generators Guide Each OtherConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
207
5
0
19 May 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
GigaST: A 10,000-hour Pseudo Speech Translation CorpusInterspeech (Interspeech), 2022
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
175
27
0
08 Apr 2022
Look Backward and Forward: Self-Knowledge Distillation with
  Bidirectional Decoder for Neural Machine Translation
Look Backward and Forward: Self-Knowledge Distillation with Bidirectional Decoder for Neural Machine Translation
Xuan Zhang
Libin Shen
Disheng Pan
Liangguo Wang
Yanjun Miao
209
1
0
10 Mar 2022
Self-Distillation Mixup Training for Non-autoregressive Neural Machine
  Translation
Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation
Jiaxin Guo
Minghan Wang
Daimeng Wei
Hengchao Shang
Yuxia Wang
...
Yan Yu
Hao Fei
Lizhi Lei
Shimin Tao
Hao Yang
208
15
0
22 Dec 2021
Amortized Noisy Channel Neural Machine Translation
Amortized Noisy Channel Neural Machine Translation
Richard Yuanzhe Pang
He He
Dong Wang
255
5
0
16 Dec 2021
Multilingual AMR Parsing with Noisy Knowledge Distillation
Multilingual AMR Parsing with Noisy Knowledge Distillation
Deng Cai
Xin Li
Jackie Chun-Sing Ho
Lidong Bing
W. Lam
217
20
0
30 Sep 2021
The NiuTrans Machine Translation Systems for WMT21
The NiuTrans Machine Translation Systems for WMT21
Yuhao Zhang
Tao Zhou
Bin Wei
Runzhe Cao
Yongyu Mu
...
Weiqiao Shan
Yinqiao Li
Bei Li
Tong Xiao
Jingbo Zhu
200
15
0
22 Sep 2021
Recurrent Stacking of Layers in Neural Networks: An Application to
  Neural Machine Translation
Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation
Mary Dabre
Atsushi Fujita
195
1
0
18 Jun 2021
Selective Knowledge Distillation for Neural Machine Translation
Selective Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Fusheng Wang
Jianhao Yan
Fandong Meng
Jie Zhou
252
70
0
27 May 2021
The Volctrans Neural Speech Translation System for IWSLT 2021
The Volctrans Neural Speech Translation System for IWSLT 2021International Workshop on Spoken Language Translation (IWSLT), 2021
Chengqi Zhao
Zhicheng Liu
Jian-Fei Tong
Tao Wang
Mingxuan Wang
Rong Ye
Qianqian Dong
Jun Cao
Lei Li
354
9
0
16 May 2021
Knowledge Distillation as Semiparametric Inference
Knowledge Distillation as Semiparametric InferenceInternational Conference on Learning Representations (ICLR), 2021
Tri Dao
G. Kamath
Vasilis Syrgkanis
Lester W. Mackey
287
37
0
20 Apr 2021
Domain Adaptation and Multi-Domain Adaptation for Neural Machine
  Translation: A Survey
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A SurveyJournal of Artificial Intelligence Research (JAIR), 2021
Danielle Saunders
AI4CE
439
113
0
14 Apr 2021
Learning Metrics from Mean Teacher: A Supervised Learning Method for
  Improving the Generalization of Speaker Verification System
Learning Metrics from Mean Teacher: A Supervised Learning Method for Improving the Generalization of Speaker Verification System
Ju-ho Kim
Hye-jin Shim
Jee-weon Jung
Ha-Jin Yu
234
1
0
14 Apr 2021
Sampling and Filtering of Neural Machine Translation Distillation Data
Sampling and Filtering of Neural Machine Translation Distillation DataNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Vilém Zouhar
180
3
0
01 Apr 2021
Text Simplification by Tagging
Text Simplification by TaggingWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2021
Kostiantyn Omelianchuk
Vipul Raheja
Oleksandr Skurzhanskyi
271
51
0
08 Mar 2021
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
ALP-KD: Attention-Based Layer Projection for Knowledge DistillationAAAI Conference on Artificial Intelligence (AAAI), 2020
Peyman Passban
Yimeng Wu
Mehdi Rezagholizadeh
Qun Liu
227
139
0
27 Dec 2020
Towards Understanding Ensemble, Knowledge Distillation and
  Self-Distillation in Deep Learning
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep LearningInternational Conference on Learning Representations (ICLR), 2020
Zeyuan Allen-Zhu
Yuanzhi Li
FedML
752
454
0
17 Dec 2020
Reciprocal Supervised Learning Improves Neural Machine Translation
Reciprocal Supervised Learning Improves Neural Machine Translation
Minkai Xu
Mingxuan Wang
Zhouhan Lin
Hao Zhou
Weinan Zhang
Lei Li
158
0
0
05 Dec 2020
Bridging the Modality Gap for Speech-to-Text Translation
Bridging the Modality Gap for Speech-to-Text Translation
Yuchen Liu
Junnan Zhu
Jiajun Zhang
Chengqing Zong
259
75
0
28 Oct 2020
DiDi's Machine Translation System for WMT2020
DiDi's Machine Translation System for WMT2020Conference on Machine Translation (WMT), 2020
Tianrun Chen
Weiwei Wang
Wenyang Wei
Xing Shi
Xiangang Li
Jieping Ye
Kevin Knight
196
2
0
16 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
172
38
0
06 Oct 2020
Weight Distillation: Transferring the Knowledge in Neural Network
  Parameters
Weight Distillation: Transferring the Knowledge in Neural Network ParametersAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Ye Lin
Yanyang Li
Ziyang Wang
Bei Li
Quan Du
Tong Xiao
Jingbo Zhu
354
30
0
19 Sep 2020
Compression of Deep Learning Models for Text: A Survey
Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data (TKDD), 2020
Manish Gupta
Puneet Agrawal
VLMMedImAI4CE
696
142
0
12 Aug 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
2.2K
4,072
0
09 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
442
119
0
05 Jun 2020
Cross-model Back-translated Distillation for Unsupervised Machine
  Translation
Cross-model Back-translated Distillation for Unsupervised Machine TranslationInternational Conference on Machine Learning (ICML), 2020
Xuan-Phi Nguyen
Shafiq Joty
Thanh-Tung Nguyen
Wu Kui
Ai Ti Aw
271
16
0
03 Jun 2020
Distilling Knowledge from Ensembles of Acoustic Models for Joint
  CTC-Attention End-to-End Speech Recognition
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition
Yan Gao
Titouan Parcollet
Nicholas D. Lane
VLM
294
17
0
19 May 2020
Building a Multi-domain Neural Machine Translation Model using Knowledge
  Distillation
Building a Multi-domain Neural Machine Translation Model using Knowledge DistillationEuropean Conference on Artificial Intelligence (ECAI), 2020
Idriss Mghabbar
Pirashanth Ratnamogan
170
15
0
15 Apr 2020
Balancing Cost and Benefit with Tied-Multi Transformers
Balancing Cost and Benefit with Tied-Multi TransformersWorkshop on Neural Generation and Translation (WNGT), 2020
Mary Dabre
Raphaël Rubino
Atsushi Fujita
184
7
0
20 Feb 2020
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and SurveyJournal of Artificial Intelligence Research (JAIR), 2019
Felix Stahlberg
3DVAI4TSMedIm
486
404
0
04 Dec 2019
Data Diversification: A Simple Strategy For Neural Machine Translation
Data Diversification: A Simple Strategy For Neural Machine Translation
Xuan-Phi Nguyen
Shafiq Joty
Wu Kui
Ai Ti Aw
506
16
0
05 Nov 2019
Multi-agent Learning for Neural Machine Translation
Multi-agent Learning for Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Tianchi Bi
Hao Xiong
Zhongjun He
Hua Wu
Haifeng Wang
AI4CE
151
13
0
03 Sep 2019
Multi-Layer Softmaxing during Training Neural Machine Translation for
  Flexible Decoding with Fewer Layers
Multi-Layer Softmaxing during Training Neural Machine Translation for Flexible Decoding with Fewer Layers
Mary Dabre
Atsushi Fujita
AI4CE
152
0
0
27 Aug 2019
End-to-End Speech Translation with Knowledge Distillation
End-to-End Speech Translation with Knowledge Distillation
Yuchen Liu
Hao Xiong
Zhongjun He
Jiajun Zhang
Hua Wu
Haifeng Wang
Chengqing Zong
321
170
0
17 Apr 2019
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
Hao Sun
Xu Tan
Jun-Wei Gan
Hongzhi Liu
Sheng Zhao
Tao Qin
Tie-Yan Liu
259
68
0
06 Apr 2019
Multilingual Neural Machine Translation with Knowledge Distillation
Multilingual Neural Machine Translation with Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2019
Xu Tan
Yi Ren
Di He
Tao Qin
Zhou Zhao
Tie-Yan Liu
359
263
0
27 Feb 2019
Distilling Knowledge for Search-based Structured Prediction
Distilling Knowledge for Search-based Structured Prediction
Yijia Liu
Wanxiang Che
Huaipeng Zhao
Bing Qin
Ting Liu
167
22
0
29 May 2018
A Stable and Effective Learning Strategy for Trainable Greedy Decoding
A Stable and Effective Learning Strategy for Trainable Greedy DecodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2018
Yun Chen
Victor O.K. Li
Dong Wang
Samuel R. Bowman
319
31
0
21 Apr 2018
12
Next
Page 1 of 2