ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00187
  4. Cited By
Scaling Neural Machine Translation

Scaling Neural Machine Translation

1 June 2018
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
    AIMat
ArXivPDFHTML

Papers citing "Scaling Neural Machine Translation"

50 / 379 papers shown
Title
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
Makoto Morishita
Jun Suzuki
Masaaki Nagata
LRM
30
64
0
25 Nov 2019
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question
  Answering
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering
Akari Asai
Kazuma Hashimoto
Hannaneh Hajishirzi
R. Socher
Caiming Xiong
RALM
KELM
LRM
15
282
0
24 Nov 2019
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Guangxiang Zhao
Xu Sun
Jingjing Xu
Zhiyuan Zhang
Liangchen Luo
LRM
14
49
0
17 Nov 2019
What do you mean, BERT? Assessing BERT as a Distributional Semantics
  Model
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Timothee Mickus
Denis Paperno
Mathieu Constant
Kees van Deemter
21
45
0
13 Nov 2019
Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling
Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling
Timothee Mickus
Denis Paperno
Mathieu Constant
16
29
0
13 Nov 2019
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
25
254
0
10 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition
Effectiveness of self-supervised pre-training for speech recognition
Alexei Baevski
Michael Auli
Abdel-rahman Mohamed
SSL
19
147
0
10 Nov 2019
Two-Headed Monster And Crossed Co-Attention Networks
Two-Headed Monster And Crossed Co-Attention Networks
Yaoyiran Li
Jing Jiang
19
0
0
10 Nov 2019
Improving Transformer Models by Reordering their Sublayers
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
11
87
0
10 Nov 2019
Distilling Knowledge Learned in BERT for Text Generation
Distilling Knowledge Learned in BERT for Text Generation
Yen-Chun Chen
Zhe Gan
Yu Cheng
Jingzhou Liu
Jingjing Liu
15
28
0
10 Nov 2019
Ask to Learn: A Study on Curiosity-driven Question Generation
Ask to Learn: A Study on Curiosity-driven Question Generation
Thomas Scialom
Jacopo Staiano
25
24
0
08 Nov 2019
Data Diversification: A Simple Strategy For Neural Machine Translation
Data Diversification: A Simple Strategy For Neural Machine Translation
Xuan-Phi Nguyen
Shafiq R. Joty
Wu Kui
A. Aw
14
15
0
05 Nov 2019
Machine Translation of Restaurant Reviews: New Corpus for Domain
  Adaptation and Robustness
Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness
Alexandre Berard
Ioan Calapodescu
Marc Dymetman
Claude Roux
Jean-Luc Meunier
Vassilina Nikoulina
9
27
0
31 Oct 2019
Naver Labs Europe's Systems for the Document-Level Generation and
  Translation Task at WNGT 2019
Naver Labs Europe's Systems for the Document-Level Generation and Translation Task at WNGT 2019
Fahimeh Saleh
Alexandre Berard
Ioan Calapodescu
Laurent Besacier
VLM
15
14
0
31 Oct 2019
Adapting Multilingual Neural Machine Translation to Unseen Languages
Adapting Multilingual Neural Machine Translation to Unseen Languages
Surafel Melaku Lakew
Alina Karakanta
Marcello Federico
Matteo Negri
Marco Turchi
31
20
0
30 Oct 2019
Controlling the Output Length of Neural Machine Translation
Controlling the Output Length of Neural Machine Translation
Surafel Melaku Lakew
Mattia Antonino Di Gangi
Marcello Federico
15
67
0
23 Oct 2019
Robust Neural Machine Translation for Clean and Noisy Speech Transcripts
Robust Neural Machine Translation for Clean and Noisy Speech Transcripts
Mattia Antonino Di Gangi
Robert Enyedi
A. Brusadin
Marcello Federico
21
25
0
22 Oct 2019
Fully Quantized Transformer for Machine Translation
Fully Quantized Transformer for Machine Translation
Gabriele Prato
Ella Charlaix
Mehdi Rezagholizadeh
MQ
13
68
0
17 Oct 2019
Transformers without Tears: Improving the Normalization of
  Self-Attention
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
36
224
0
14 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
11
660
0
12 Oct 2019
SlowMo: Improving Communication-Efficient Distributed SGD with Slow
  Momentum
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Jianyu Wang
Vinayak Tantia
Nicolas Ballas
Michael G. Rabbat
4
200
0
01 Oct 2019
UNITER: UNiversal Image-TExt Representation Learning
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
29
444
0
25 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
22
584
0
25 Sep 2019
Improved Variational Neural Machine Translation by Promoting Mutual
  Information
Improved Variational Neural Machine Translation by Promoting Mutual Information
Arya D. McCarthy
Xian Li
Jiatao Gu
Ning Dong
DRL
22
7
0
19 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
H. Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
23
716
0
13 Sep 2019
Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent
  Neural Network Machine Translation
Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation
Junya Ono
Masao Utiyama
Eiichiro Sumita
AIMat
AI4CE
11
7
0
02 Sep 2019
Improving Multi-Head Attention with Capsule Networks
Improving Multi-Head Attention with Capsule Networks
Shuhao Gu
Yang Feng
12
12
0
31 Aug 2019
Scale Calibrated Training: Improving Generalization of Deep Networks via
  Scale-Specific Normalization
Scale Calibrated Training: Improving Generalization of Deep Networks via Scale-Specific Normalization
Zhuoran Yu
Aojun Zhou
Yukun Ma
Yudian Li
Xiaohan Zhang
Ping Luo
16
3
0
31 Aug 2019
Adaptively Sparse Transformers
Adaptively Sparse Transformers
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
8
252
0
30 Aug 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged
  Attention
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
Biao Zhang
Ivan Titov
Rico Sennrich
6
101
0
29 Aug 2019
Simple and Effective Noisy Channel Modeling for Neural Machine
  Translation
Simple and Effective Noisy Channel Modeling for Neural Machine Translation
Kyra Yee
Nathan Ng
Yann N. Dauphin
Michael Auli
12
79
0
15 Aug 2019
Towards Knowledge-Based Recommender Dialog System
Towards Knowledge-Based Recommender Dialog System
Qibin Chen
Junyang Lin
Yichang Zhang
Ming Ding
Yukuo Cen
Hongxia Yang
Jie Tang
19
237
0
15 Aug 2019
On The Evaluation of Machine Translation Systems Trained With
  Back-Translation
On The Evaluation of Machine Translation Systems Trained With Back-Translation
Sergey Edunov
Myle Ott
MarcÁurelio Ranzato
Michael Auli
9
96
0
14 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning
  Training
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
31
55
0
30 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
121
23,865
0
26 Jul 2019
DropAttention: A Regularization Method for Fully-Connected
  Self-Attention Networks
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
Zehui Lin
Pengfei Liu
Luyao Huang
Junkun Chen
Xipeng Qiu
Xuanjing Huang
3DPC
16
44
0
25 Jul 2019
ELI5: Long Form Question Answering
ELI5: Long Form Question Answering
Angela Fan
Yacine Jernite
Ethan Perez
David Grangier
Jason Weston
Michael Auli
AI4MH
ELM
17
592
0
22 Jul 2019
Facebook FAIR's WMT19 News Translation Task Submission
Facebook FAIR's WMT19 News Translation Task Submission
Nathan Ng
Kyra Yee
Alexei Baevski
Myle Ott
Michael Auli
Sergey Edunov
VLM
6
393
0
15 Jul 2019
Naver Labs Europe's Systems for the WMT19 Machine Translation Robustness
  Task
Naver Labs Europe's Systems for the WMT19 Machine Translation Robustness Task
Alexandre Berard
Ioan Calapodescu
Claude Roux
VLM
4
59
0
15 Jul 2019
Massively Multilingual Neural Machine Translation in the Wild: Findings
  and Challenges
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
N. Arivazhagan
Ankur Bapna
Orhan Firat
Dmitry Lepikhin
Melvin Johnson
...
George F. Foster
Colin Cherry
Wolfgang Macherey
Z. Chen
Yonghui Wu
23
422
0
11 Jul 2019
A Highly Efficient Distributed Deep Learning System For Automatic Speech
  Recognition
A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
Wei Zhang
Xiaodong Cui
Ulrich Finkler
G. Saon
Abdullah Kayi
A. Buyuktosunoglu
Brian Kingsbury
David S. Kung
M. Picheny
18
19
0
10 Jul 2019
NTT's Machine Translation Systems for WMT19 Robustness Task
NTT's Machine Translation Systems for WMT19 Robustness Task
Soichiro Murakami
Makoto Morishita
Tsutomu Hirao
Masaaki Nagata
VLM
10
9
0
09 Jul 2019
Improving Robustness in Real-World Neural Machine Translation Engines
Improving Robustness in Real-World Neural Machine Translation Engines
Rohit Gupta
Patrik Lambert
Raj Nath Patel
J. Tinsley
29
4
0
02 Jul 2019
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Alham Fikri Aji
Kenneth Heafield
19
13
0
08 Jun 2019
Playing the lottery with rewards and multiple languages: lottery tickets
  in RL and NLP
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP
Haonan Yu
Sergey Edunov
Yuandong Tian
Ari S. Morcos
16
148
0
06 Jun 2019
Understanding and Improving Transformer From a Multi-Particle Dynamic
  System Point of View
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Yiping Lu
Zhuohan Li
Di He
Zhiqing Sun
Bin Dong
Tao Qin
Liwei Wang
Tie-Yan Liu
AI4CE
13
168
0
06 Jun 2019
Learning Deep Transformer Models for Machine Translation
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
14
656
0
05 Jun 2019
Evaluating Gender Bias in Machine Translation
Evaluating Gender Bias in Machine Translation
Gabriel Stanovsky
Noah A. Smith
Luke Zettlemoyer
11
393
0
03 Jun 2019
Stochastic Gradient Methods with Layer-wise Adaptive Moments for
  Training of Deep Networks
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
Boris Ginsburg
P. Castonguay
Oleksii Hrinchuk
Oleksii Kuchaiev
Vitaly Lavrukhin
Ryan Leary
Jason Chun Lok Li
Huyen Nguyen
Yang Zhang
Jonathan M. Cohen
ODL
12
13
0
27 May 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
13
1,035
0
25 May 2019
Previous
12345678
Next