Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.00187
Cited By
Scaling Neural Machine Translation
1 June 2018
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
AIMat
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Neural Machine Translation"
50 / 379 papers shown
Title
Reversible Column Networks
Yuxuan Cai
Yi Zhou
Qi Han
Jianjian Sun
Xiangwen Kong
Jun Yu Li
Xiangyu Zhang
VLM
31
53
0
22 Dec 2022
A Neural ODE Interpretation of Transformer Layers
Yaofeng Desmond Zhong
Tongtao Zhang
Amit Chakraborty
Biswadip Dey
20
9
0
12 Dec 2022
P-Transformer: Towards Better Document-to-Document Neural Machine Translation
Yachao Li
Junhui Li
Jing Jiang
Shimin Tao
Hao-Yu Yang
M. Zhang
ViT
25
9
0
12 Dec 2022
Subword-Delimited Downsampling for Better Character-Level Translation
Lukas Edman
Antonio Toral
Gertjan van Noord
14
6
0
02 Dec 2022
Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling
Zhijun Wang
Xuebo Liu
Min Zhang
25
11
0
23 Nov 2022
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
30
3
0
14 Nov 2022
Easy Guided Decoding in Providing Suggestions for Interactive Machine Translation
Ke Min Wang
Xin Ge
Yuqi Zhang
Yu Zhao
Jiayi Wang
16
2
0
14 Nov 2022
Parallel Attention Forcing for Machine Translation
Qingyun Dou
Mark J. F. Gales
19
0
0
06 Nov 2022
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
32
3
0
27 Oct 2022
Modeling Context With Linear Attention for Scalable Document-Level Translation
Zhaofeng Wu
Hao Peng
Nikolaos Pappas
Noah A. Smith
14
3
0
16 Oct 2022
Low-resource Neural Machine Translation with Cross-modal Alignment
Zhe Yang
Qingkai Fang
Yang Feng
VLM
34
9
0
13 Oct 2022
FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis via Stacked Transformers
Yitian Liu
Z. Lian
41
13
0
12 Oct 2022
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
99
42
0
11 Oct 2022
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
14
182
0
21 Sep 2022
Dodging the Data Bottleneck: Automatic Subtitling with Automatically Segmented ST Corpora
Sara Papi
Alina Karakanta
Matteo Negri
Marco Turchi
28
8
0
21 Sep 2022
EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers
Ricardo V. Godoy
Tharik J. S. Reis
Paulo H. Polegato
G. J. G. Lahr
R. Saute
F. Nakano
H. Machado
A. Sakamoto
Marcelo Becker
G. Caurin
32
7
0
18 Sep 2022
Adapting to Non-Centered Languages for Zero-shot Multilingual Translation
Zhi Qu
Taro Watanabe
42
7
0
09 Sep 2022
Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems
Prasoon Sinha
Akhil Guliani
Rutwik Jain
Brandon Tran
Matthew D. Sinclair
Shivaram Venkataraman
11
17
0
23 Aug 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
29
625
0
15 Aug 2022
Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Mengsay Loem
Sho Takase
Masahiro Kaneko
Naoaki Okazaki
11
1
0
27 Jul 2022
Sockeye 3: Fast Neural Machine Translation with PyTorch
F. Hieber
Michael J. Denkowski
Tobias Domhan
Barbara Darques Barros
Celina Dong Ye
...
Maria Nadejde
Surafel Melaku Lakew
Prashant Mathur
Anna Currey
Marcello Federico
OSLM
32
10
0
12 Jul 2022
Training Transformers Together
Alexander Borzunov
Max Ryabinin
Tim Dettmers
Quentin Lhoest
Lucile Saulnier
Michael Diskin
Yacine Jernite
Thomas Wolf
ViT
21
8
0
07 Jul 2022
Learning Multiscale Transformer Models for Sequence Generation
Bei Li
Tong Zheng
Yi Jing
Chengbo Jiao
Tong Xiao
Jingbo Zhu
24
9
0
19 Jun 2022
SYMBA: Symbolic Computation of Squared Amplitudes in High Energy Physics with Machine Learning
Abdulhakim Alnuqaydan
S. Gleyzer
Harrison B. Prosper
16
14
0
17 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
21
14
0
07 Jun 2022
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
Pengzhi Gao
Zhongjun He
Hua-Hong Wu
Haifeng Wang
28
13
0
06 Jun 2022
B2T Connection: Serving Stability and Performance in Deep Transformers
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
13
10
0
01 Jun 2022
Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model
Xuan-Phi Nguyen
Shafiq R. Joty
Wu Kui
A. Aw
LRM
21
3
0
31 May 2022
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Tao Ge
Heming Xia
Xin Sun
Si-Qing Chen
Furu Wei
85
18
0
20 May 2022
Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration
Jing Lin
Xiaowan Hu
Yuanhao Cai
Haoqian Wang
Youliang Yan
X. Zou
Yulun Zhang
Luc Van Gool
51
17
0
20 May 2022
Unifying the Convergences in Multilingual Neural Machine Translation
Yi-Chong Huang
Xiaocheng Feng
Xinwei Geng
Bing Qin
28
6
0
03 May 2022
Multi-Camera Multiple 3D Object Tracking on the Move for Autonomous Vehicles
Pha Nguyen
Kha Gia Quach
C. Duong
Ngan Le
Xuan-Bac Nguyen
Khoa Luu
3DPC
20
19
0
19 Apr 2022
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation
Xiangpeng Wei
Heng Yu
Yue Hu
Rongxiang Weng
Weihua Luo
Jun Xie
Rong Jin
CLL
15
24
0
14 Apr 2022
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Chen Liang
Pengcheng He
Yelong Shen
Weizhu Chen
T. Zhao
FedML
9
6
0
13 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
20
31
0
10 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
25
9
0
06 Apr 2022
On the Effectiveness of Pretrained Models for API Learning
M. Hadi
Imam Nur Bani Yusuf
Ferdian Thung
K. Luong
Jiang Lingxiao
Fatemeh H. Fard
David Lo
28
13
0
05 Apr 2022
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
Heming Xia
Tao Ge
Peiyi Wang
Si-Qing Chen
Furu Wei
Zhifang Sui
19
69
0
30 Mar 2022
Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
Sho Takase
Tatsuya Hiraoka
Naoaki Okazaki
10
5
0
25 Mar 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Zhaodong Chen
Yuying Quan
Zheng Qu
L. Liu
Yufei Ding
Yuan Xie
28
22
0
28 Feb 2022
JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus
Makoto Morishita
Katsuki Chousa
Jun Suzuki
Masaaki Nagata
15
27
0
25 Feb 2022
Attention Enables Zero Approximation Error
Zhiying Fang
Yidong Ouyang
Ding-Xuan Zhou
Guang Cheng
13
5
0
24 Feb 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
Han Shi
Jiahui Gao
Hang Xu
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
Stephen M. S. Lee
James T. Kwok
22
71
0
17 Feb 2022
End-to-End Training for Back-Translation with Categorical Reparameterization Trick
DongNyeong Heo
Heeyoul Choi
11
0
0
17 Feb 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge
Si-Qing Chen
Furu Wei
MoE
22
21
0
16 Feb 2022
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data
Yaoqing Yang
Ryan Theisen
Liam Hodgkinson
Joseph E. Gonzalez
Kannan Ramchandran
Charles H. Martin
Michael W. Mahoney
86
17
0
06 Feb 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Chen Liang
Haoming Jiang
Simiao Zuo
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
T. Zhao
17
14
0
06 Feb 2022
ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization
Mengsay Loem
Sho Takase
Masahiro Kaneko
Naoaki Okazaki
13
12
0
14 Jan 2022
Previous
1
2
3
4
5
6
7
8
Next