ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.07947
  4. Cited By
Sequence-Level Knowledge Distillation

Sequence-Level Knowledge Distillation

25 June 2016
Yoon Kim
Alexander M. Rush
ArXivPDFHTML

Papers citing "Sequence-Level Knowledge Distillation"

50 / 197 papers shown
Title
A Reparameterized Discrete Diffusion Model for Text Generation
A Reparameterized Discrete Diffusion Model for Text Generation
Lin Zheng
Jianbo Yuan
Lei Yu
Lingpeng Kong
DiffM
33
57
0
11 Feb 2023
N-Gram Nearest Neighbor Machine Translation
N-Gram Nearest Neighbor Machine Translation
Rui Lv
Junliang Guo
Rui Wang
Xu Tan
Qi Liu
Tao Qin
23
2
0
30 Jan 2023
How Does Beam Search improve Span-Level Confidence Estimation in
  Generative Sequence Labeling?
How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?
Kazuma Hashimoto
Iftekhar Naim
K. Raman
UQLM
27
2
0
21 Dec 2022
WACO: Word-Aligned Contrastive Learning for Speech Translation
WACO: Word-Aligned Contrastive Learning for Speech Translation
Siqi Ouyang
Rong Ye
Lei Li
24
25
0
19 Dec 2022
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
  Inpainting
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Su Wang
Chitwan Saharia
Ceslee Montgomery
Jordi Pont-Tuset
Shai Noy
...
Radu Soricut
Jason Baldridge
Mohammad Norouzi
Peter Anderson
William Chan
27
173
0
13 Dec 2022
Life-long Learning for Multilingual Neural Machine Translation with
  Knowledge Distillation
Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation
Yang Zhao
Junnan Zhu
Lu Xiang
Jiajun Zhang
Yu Zhou
Feifei Zhai
Chengqing Zong
CLL
37
6
0
06 Dec 2022
Summer: WeChat Neural Machine Translation Systems for the WMT22
  Biomedical Translation Task
Summer: WeChat Neural Machine Translation Systems for the WMT22 Biomedical Translation Task
Ernan Li
Fandong Meng
Jie Zhou
MedIm
8
1
0
28 Nov 2022
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
Yunlong Liang
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
16
2
0
28 Nov 2022
Continual Learning of Neural Machine Translation within Low Forgetting
  Risk Regions
Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions
Shuhao Gu
Bojie Hu
Yang Feng
CLL
33
11
0
03 Nov 2022
Teacher-Student Architecture for Knowledge Learning: A Survey
Teacher-Student Architecture for Knowledge Learning: A Survey
Chengming Hu
Xuan Li
Dan Liu
Xi Chen
Ju Wang
Xue Liu
20
35
0
28 Oct 2022
Too Brittle To Touch: Comparing the Stability of Quantization and
  Distillation Towards Developing Lightweight Low-Resource MT Models
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
27
5
0
27 Oct 2022
Referee: Reference-Free Sentence Summarization with Sharper
  Controllability through Symbolic Knowledge Distillation
Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation
Melanie Sclar
Peter West
Sachin Kumar
Yulia Tsvetkov
Yejin Choi
18
19
0
25 Oct 2022
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model
  for Low-Resource Languages
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLM
MoE
LRM
21
20
0
20 Oct 2022
A baseline revisited: Pushing the limits of multi-segment models for
  context-aware translation
A baseline revisited: Pushing the limits of multi-segment models for context-aware translation
Suvodeep Majumde
Stanislas Lauly
Maria Nadejde
Marcello Federico
Georgiana Dinu
30
13
0
19 Oct 2022
Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive
  Machine Translation
Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation
Chenze Shao
Zhengrui Ma
Yang Feng
34
14
0
11 Oct 2022
Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine
  Translation
Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation
Chenze Shao
Yang Feng
30
20
0
08 Oct 2022
Direct Speech Translation for Automatic Subtitling
Direct Speech Translation for Automatic Subtitling
Sara Papi
Marco Gaido
Alina Karakanta
Mauro Cettolo
Matteo Negri
Marco Turchi
46
11
0
27 Sep 2022
CATER: Intellectual Property Protection on Text Generation APIs via
  Conditional Watermarks
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
Xuanli He
Qiongkai Xu
Yi Zeng
Lingjuan Lyu
Fangzhao Wu
Jiwei Li
R. Jia
WaLM
183
71
0
19 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality
  Text-to-Speech
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
44
193
0
13 Jul 2022
Building Multilingual Machine Translation Systems That Serve Arbitrary
  X-Y Translations
Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations
Akiko Eriguchi
Shufang Xie
Tao Qin
Hany Awadalla
LRM
53
7
0
30 Jun 2022
Bridging the Gap Between Training and Inference of Bayesian Controllable
  Language Models
Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models
Han Liu
Bingning Wang
Ting Yao
Haijin Liang
Jianjin Xu
Xiaolin Hu
BDL
29
1
0
11 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
40
9
0
22 May 2022
Twist Decoding: Diverse Generators Guide Each Other
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
19
4
0
19 May 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Marco Gaido
Sara Papi
Dennis Fucci
G. Fiameni
Matteo Negri
Marco Turchi
28
19
0
05 May 2022
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
Jindvrich Helcl
Barry Haddow
Alexandra Birch
19
19
0
04 May 2022
Nearest Neighbor Knowledge Distillation for Neural Machine Translation
Nearest Neighbor Knowledge Distillation for Neural Machine Translation
Zhixian Yang
Renliang Sun
Xiaojun Wan
13
12
0
01 May 2022
Prompt Consistency for Zero-Shot Task Generalization
Prompt Consistency for Zero-Shot Task Generalization
Chunting Zhou
Junxian He
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
VLM
6
74
0
29 Apr 2022
UniTE: Unified Translation Evaluation
UniTE: Unified Translation Evaluation
Yu Wan
Dayiheng Liu
Baosong Yang
Haibo Zhang
Boxing Chen
Derek F. Wong
Lidia S. Chao
30
41
0
28 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
23
22
0
22 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in
  Production Environments
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Christopher Hidey
Fei Liu
Rahul Goel
16
4
0
10 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
9
23
0
08 Apr 2022
Does Simultaneous Speech Translation need Simultaneous Models?
Does Simultaneous Speech Translation need Simultaneous Models?
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
39
26
0
08 Apr 2022
$\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text
  Generation
latent\textit{latent}latent-GLAT: Glancing at Latent Variables for Parallel Text Generation
Yu Bao
Hao Zhou
Shujian Huang
Dongqi Wang
Lihua Qian
Xinyu Dai
Jiajun Chen
Lei Li
23
38
0
05 Apr 2022
Ensembling and Knowledge Distilling of Large Sequence Taggers for
  Grammatical Error Correction
Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction
M. Tarnavskyi
Artem Chernodub
Kostiantyn Omelianchuk
3DV
17
24
0
24 Mar 2022
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
  Translation
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Chih-Chiang Chang
Hung-yi Lee
19
13
0
22 Mar 2022
Self-Distribution Distillation: Efficient Uncertainty Estimation
Self-Distribution Distillation: Efficient Uncertainty Estimation
Yassir Fathullah
Mark J. F. Gales
UQCV
14
11
0
15 Mar 2022
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in
  Practice
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice
Andreas Grivas
Nikolay Bogoychev
Adam Lopez
11
9
0
12 Mar 2022
Efficient Sub-structured Knowledge Distillation
Efficient Sub-structured Knowledge Distillation
Wenye Lin
Yangming Li
Lemao Liu
Shuming Shi
Haitao Zheng
12
1
0
09 Mar 2022
Relational Surrogate Loss Learning
Relational Surrogate Loss Learning
Tao Huang
Zekang Li
Hua Lu
Yong Shan
Shusheng Yang
Yang Feng
Fei Wang
Shan You
Chang Xu
16
5
0
26 Feb 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq
  Generation
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge
Si-Qing Chen
Furu Wei
MoE
22
21
0
16 Feb 2022
Exploring the Limits of Domain-Adaptive Training for Detoxifying
  Large-Scale Language Models
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Boxin Wang
Wei Ping
Chaowei Xiao
P. Xu
M. Patwary
M. Shoeybi
Bo-wen Li
Anima Anandkumar
Bryan Catanzaro
9
64
0
08 Feb 2022
Improving Neural Machine Translation by Denoising Training
Improving Neural Machine Translation by Denoising Training
Liang Ding
Keqin Peng
Dacheng Tao
VLM
AI4CE
27
6
0
19 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
Can Multilinguality benefit Non-autoregressive Machine Translation?
Can Multilinguality benefit Non-autoregressive Machine Translation?
Sweta Agrawal
Julia Kreutzer
Colin Cherry
AI4CE
27
1
0
16 Dec 2021
Sequence-level self-learning with multiple hypotheses
Sequence-level self-learning with multiple hypotheses
K. Kumatani
Dimitrios Dimitriadis
Yashesh Gaur
R. Gmyr
Sefik Emre Eskimez
Jinyu Li
Michael Zeng
SSL
15
1
0
10 Dec 2021
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Shota Orihashi
Yoshihiro Yamazaki
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Ryo Masumura
17
0
0
22 Nov 2021
Semi-Autoregressive Image Captioning
Semi-Autoregressive Image Captioning
Xu Yan
Zhengcong Fei
Zekang Li
Shuhui Wang
Qingming Huang
Qi Tian
27
23
0
11 Oct 2021
Integrated Training for Sequence-to-Sequence Models Using
  Non-Autoregressive Transformer
Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer
E. Tokarchuk
Jan Rosendahl
Weiyue Wang
Pavel Petrushkov
Tomer Lancewicki
Shahram Khadivi
Hermann Ney
23
2
0
27 Sep 2021
Partial to Whole Knowledge Distillation: Progressive Distilling
  Decomposed Knowledge Boosts Student Better
Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better
Xuanyang Zhang
X. Zhang
Jian-jun Sun
23
1
0
26 Sep 2021
Previous
1234
Next