Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1902.10461
Cited By
Multilingual Neural Machine Translation with Knowledge Distillation
27 February 2019
Xu Tan
Yi Ren
Di He
Tao Qin
Zhou Zhao
Tie-Yan Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multilingual Neural Machine Translation with Knowledge Distillation"
37 / 37 papers shown
Title
Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data
Yuting He
Yiqiang Chen
Xiaodong Yang
H. Yu
Yi-Hua Huang
Yang Gu
FedML
38
20
0
20 Apr 2025
Heuristic-Free Multi-Teacher Learning
Huy Thong Nguyen
En-Hung Chu
Lenord Melvix
Jazon Jiao
Chunglin Wen
Benjamin Louie
67
0
0
19 Nov 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
43
24
0
19 Aug 2024
Don't Throw Away Data: Better Sequence Knowledge Distillation
Jun Wang
Eleftheria Briakou
Hamid Dadkhahi
Rishabh Agarwal
Colin Cherry
Trevor Cohn
39
5
0
15 Jul 2024
Leveraging Topological Guidance for Improved Knowledge Distillation
Eun Som Jeon
Rahul Khurana
Aishani Pathak
P. Turaga
44
0
0
07 Jul 2024
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
Vilém Zouhar
AAML
30
0
0
29 Jan 2024
Towards Higher Pareto Frontier in Multilingual Machine Translation
Yi-Chong Huang
Xiaocheng Feng
Xinwei Geng
Baohang Li
Bing Qin
28
9
0
25 May 2023
MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion
Zi Gong
Yinpeng Guo
Pingyi Zhou
Cuiyun Gao
Yasheng Wang
Zenglin Xu
12
8
0
19 Dec 2022
Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation
Yang Zhao
Junnan Zhu
Lu Xiang
Jiajun Zhang
Yu Zhou
Feifei Zhai
Chengqing Zong
CLL
37
6
0
06 Dec 2022
Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning
Seunghyun Lee
B. Song
11
8
0
05 Mar 2022
Multilingual training for Software Engineering
Toufique Ahmed
Prem Devanbu
57
72
0
03 Dec 2021
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Shota Orihashi
Yoshihiro Yamazaki
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Ryo Masumura
12
0
0
22 Nov 2021
Language Modelling via Learning to Rank
A. Frydenlund
Gagandeep Singh
Frank Rudzicz
37
7
0
13 Oct 2021
Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better
Xuanyang Zhang
X. Zhang
Jian-jun Sun
19
1
0
26 Sep 2021
Multilingual Translation via Grafting Pre-trained Language Models
Zewei Sun
Mingxuan Wang
Lei Li
AI4CE
181
22
0
11 Sep 2021
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Haoran Xu
Benjamin Van Durme
Kenton W. Murray
42
57
0
09 Sep 2021
IndicBART: A Pre-trained Model for Indic Natural Language Generation
Raj Dabre
Himani Shrotriya
Anoop Kunchukuttan
Ratish Puduppully
Mitesh M. Khapra
Pratyush Kumar
23
70
0
07 Sep 2021
Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
Qiongkai Xu
Xuanli He
Lingjuan Lyu
Lizhen Qu
Gholamreza Haffari
MLAU
25
21
0
29 Aug 2021
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
33
58
0
09 Jul 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
351
0
29 Jun 2021
Neural Machine Translation for Low-Resource Languages: A Survey
Surangika Ranathunga
E. Lee
Marjana Prifti Skenduli
Ravi Shekhar
Mehreen Alam
Rishemjit Kaur
25
233
0
29 Jun 2021
Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
13
6
0
23 Jun 2021
Diversifying Dialog Generation via Adaptive Label Smoothing
Yida Wang
Yinhe Zheng
Yong-jia Jiang
Minlie Huang
8
37
0
30 May 2021
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
27
56
0
14 Dec 2020
Ensemble Knowledge Distillation for CTR Prediction
Jieming Zhu
Jinyang Liu
Weiqi Li
Jincai Lai
Xiuqiang He
Liang Chen
Zibin Zheng
20
54
0
08 Nov 2020
Revisiting Modularized Multilingual NMT to Meet Industrial Demands
Sungwon Lyu
Bokyung Son
Kichang Yang
Jaekyoung Bae
MoE
8
20
0
19 Oct 2020
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang
Yong-jia Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
21
10
0
10 Oct 2020
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
17
2,822
0
09 Jun 2020
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Xinyu Wang
Yong-jia Jiang
Nguyen Bach
Tao Wang
Fei Huang
Kewei Tu
24
36
0
08 Apr 2020
A Study of Multilingual Neural Machine Translation
Xu Tan
Yichong Leng
Jiale Chen
Yi Ren
Tao Qin
Tie-Yan Liu
14
8
0
25 Dec 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
17
228
0
10 Jul 2019
Knowledge Distillation by On-the-Fly Native Ensemble
Xu Lan
Xiatian Zhu
S. Gong
187
472
0
12 Jun 2018
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
267
402
0
09 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,878
0
15 Sep 2016
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
Orhan Firat
Kyunghyun Cho
Yoshua Bengio
LRM
AIMat
206
622
0
06 Jan 2016
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
214
7,687
0
17 Aug 2015
1