Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.07947
Cited By
Sequence-Level Knowledge Distillation
25 June 2016
Yoon Kim
Alexander M. Rush
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sequence-Level Knowledge Distillation"
50 / 197 papers shown
Title
A Reparameterized Discrete Diffusion Model for Text Generation
Lin Zheng
Jianbo Yuan
Lei Yu
Lingpeng Kong
DiffM
33
57
0
11 Feb 2023
N-Gram Nearest Neighbor Machine Translation
Rui Lv
Junliang Guo
Rui Wang
Xu Tan
Qi Liu
Tao Qin
23
2
0
30 Jan 2023
How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?
Kazuma Hashimoto
Iftekhar Naim
K. Raman
UQLM
27
2
0
21 Dec 2022
WACO: Word-Aligned Contrastive Learning for Speech Translation
Siqi Ouyang
Rong Ye
Lei Li
24
25
0
19 Dec 2022
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Su Wang
Chitwan Saharia
Ceslee Montgomery
Jordi Pont-Tuset
Shai Noy
...
Radu Soricut
Jason Baldridge
Mohammad Norouzi
Peter Anderson
William Chan
27
173
0
13 Dec 2022
Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation
Yang Zhao
Junnan Zhu
Lu Xiang
Jiajun Zhang
Yu Zhou
Feifei Zhai
Chengqing Zong
CLL
37
6
0
06 Dec 2022
Summer: WeChat Neural Machine Translation Systems for the WMT22 Biomedical Translation Task
Ernan Li
Fandong Meng
Jie Zhou
MedIm
8
1
0
28 Nov 2022
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
Yunlong Liang
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
16
2
0
28 Nov 2022
Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions
Shuhao Gu
Bojie Hu
Yang Feng
CLL
33
11
0
03 Nov 2022
Teacher-Student Architecture for Knowledge Learning: A Survey
Chengming Hu
Xuan Li
Dan Liu
Xi Chen
Ju Wang
Xue Liu
20
35
0
28 Oct 2022
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
27
5
0
27 Oct 2022
Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation
Melanie Sclar
Peter West
Sachin Kumar
Yulia Tsvetkov
Yejin Choi
18
19
0
25 Oct 2022
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLM
MoE
LRM
21
20
0
20 Oct 2022
A baseline revisited: Pushing the limits of multi-segment models for context-aware translation
Suvodeep Majumde
Stanislas Lauly
Maria Nadejde
Marcello Federico
Georgiana Dinu
30
13
0
19 Oct 2022
Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation
Chenze Shao
Zhengrui Ma
Yang Feng
34
14
0
11 Oct 2022
Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation
Chenze Shao
Yang Feng
30
20
0
08 Oct 2022
Direct Speech Translation for Automatic Subtitling
Sara Papi
Marco Gaido
Alina Karakanta
Mauro Cettolo
Matteo Negri
Marco Turchi
46
11
0
27 Sep 2022
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
Xuanli He
Qiongkai Xu
Yi Zeng
Lingjuan Lyu
Fangzhao Wu
Jiwei Li
R. Jia
WaLM
183
71
0
19 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
44
193
0
13 Jul 2022
Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations
Akiko Eriguchi
Shufang Xie
Tao Qin
Hany Awadalla
LRM
53
7
0
30 Jun 2022
Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models
Han Liu
Bingning Wang
Ting Yao
Haijin Liang
Jianjin Xu
Xiaolin Hu
BDL
29
1
0
11 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
40
9
0
22 May 2022
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
19
4
0
19 May 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Marco Gaido
Sara Papi
Dennis Fucci
G. Fiameni
Matteo Negri
Marco Turchi
28
19
0
05 May 2022
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
Jindvrich Helcl
Barry Haddow
Alexandra Birch
19
19
0
04 May 2022
Nearest Neighbor Knowledge Distillation for Neural Machine Translation
Zhixian Yang
Renliang Sun
Xiaojun Wan
13
12
0
01 May 2022
Prompt Consistency for Zero-Shot Task Generalization
Chunting Zhou
Junxian He
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
VLM
6
74
0
29 Apr 2022
UniTE: Unified Translation Evaluation
Yu Wan
Dayiheng Liu
Baosong Yang
Haibo Zhang
Boxing Chen
Derek F. Wong
Lidia S. Chao
30
41
0
28 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
23
22
0
22 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Christopher Hidey
Fei Liu
Rahul Goel
16
4
0
10 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
9
23
0
08 Apr 2022
Does Simultaneous Speech Translation need Simultaneous Models?
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
39
26
0
08 Apr 2022
latent
\textit{latent}
latent
-GLAT: Glancing at Latent Variables for Parallel Text Generation
Yu Bao
Hao Zhou
Shujian Huang
Dongqi Wang
Lihua Qian
Xinyu Dai
Jiajun Chen
Lei Li
23
38
0
05 Apr 2022
Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction
M. Tarnavskyi
Artem Chernodub
Kostiantyn Omelianchuk
3DV
17
24
0
24 Mar 2022
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Chih-Chiang Chang
Hung-yi Lee
19
13
0
22 Mar 2022
Self-Distribution Distillation: Efficient Uncertainty Estimation
Yassir Fathullah
Mark J. F. Gales
UQCV
14
11
0
15 Mar 2022
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice
Andreas Grivas
Nikolay Bogoychev
Adam Lopez
11
9
0
12 Mar 2022
Efficient Sub-structured Knowledge Distillation
Wenye Lin
Yangming Li
Lemao Liu
Shuming Shi
Haitao Zheng
12
1
0
09 Mar 2022
Relational Surrogate Loss Learning
Tao Huang
Zekang Li
Hua Lu
Yong Shan
Shusheng Yang
Yang Feng
Fei Wang
Shan You
Chang Xu
16
5
0
26 Feb 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge
Si-Qing Chen
Furu Wei
MoE
22
21
0
16 Feb 2022
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Boxin Wang
Wei Ping
Chaowei Xiao
P. Xu
M. Patwary
M. Shoeybi
Bo-wen Li
Anima Anandkumar
Bryan Catanzaro
9
64
0
08 Feb 2022
Improving Neural Machine Translation by Denoising Training
Liang Ding
Keqin Peng
Dacheng Tao
VLM
AI4CE
27
6
0
19 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
Can Multilinguality benefit Non-autoregressive Machine Translation?
Sweta Agrawal
Julia Kreutzer
Colin Cherry
AI4CE
27
1
0
16 Dec 2021
Sequence-level self-learning with multiple hypotheses
K. Kumatani
Dimitrios Dimitriadis
Yashesh Gaur
R. Gmyr
Sefik Emre Eskimez
Jinyu Li
Michael Zeng
SSL
15
1
0
10 Dec 2021
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Shota Orihashi
Yoshihiro Yamazaki
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Ryo Masumura
17
0
0
22 Nov 2021
Semi-Autoregressive Image Captioning
Xu Yan
Zhengcong Fei
Zekang Li
Shuhui Wang
Qingming Huang
Qi Tian
27
23
0
11 Oct 2021
Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer
E. Tokarchuk
Jan Rosendahl
Weiyue Wang
Pavel Petrushkov
Tomer Lancewicki
Shahram Khadivi
Hermann Ney
23
2
0
27 Sep 2021
Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better
Xuanyang Zhang
X. Zhang
Jian-jun Sun
23
1
0
26 Sep 2021
Previous
1
2
3
4
Next