ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00187
  4. Cited By
Scaling Neural Machine Translation

Scaling Neural Machine Translation

1 June 2018
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
    AIMat
ArXivPDFHTML

Papers citing "Scaling Neural Machine Translation"

50 / 379 papers shown
Title
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
K. Zhang
Lizhuang Ma
J. Wang
J. Wang
W. Zhang
MQ
51
0
0
01 May 2025
Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation
Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation
Zhuang Yu
Shiliang Sun
Jing Zhao
Tengfei Song
Hao-Yu Yang
48
0
0
25 Apr 2025
Self-Vocabularizing Training for Neural Machine Translation
Self-Vocabularizing Training for Neural Machine Translation
Pin-Jie Lin
Ernie Chang
Yangyang Shi
Vikas Chandra
63
0
0
18 Mar 2025
Context-aware Biases for Length Extrapolation
Ali Veisi
Amir Mansourian
50
0
0
11 Mar 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles X. Ling
Boyu Wang
49
1
0
24 Jan 2025
Predictor-Corrector Enhanced Transformers with Exponential Moving
  Average Coefficient Learning
Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning
B. Li
Tong Zheng
R. Wang
Jiahao Liu
Qingyan Guo
...
Xu Tan
Tong Xiao
Jingbo Zhu
J. Wang
Xunliang Cai
50
1
0
05 Nov 2024
Efficient Machine Translation with a BiLSTM-Attention Approach
Efficient Machine Translation with a BiLSTM-Attention Approach
Yuxu Wu
Yiren Xing
12
0
0
29 Oct 2024
MGH Radiology Llama: A Llama 3 70B Model for Radiology
MGH Radiology Llama: A Llama 3 70B Model for Radiology
Yucheng Shi
Peng Shu
Zhengliang Liu
Zihao Wu
Quanzheng Li
Xiang Li
LM&MA
15
0
0
13 Aug 2024
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
  Sequences Training
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Cheng Luo
Jiawei Zhao
Zhuoming Chen
Beidi Chen
A. Anandkumar
21
3
0
22 Jul 2024
PASTA: Controllable Part-Aware Shape Generation with Autoregressive
  Transformers
PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers
Songlin Li
Despoina Paschalidou
Leonidas J. Guibas
45
2
0
18 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
The infrastructure powering IBM's Gen AI model development
The infrastructure powering IBM's Gen AI model development
Talia Gershon
Seetharami R. Seelam
Brian M. Belgodere
Milton Bonilla
Lan Hoang
...
Ruchir Puri
Dakshi Agrawal
Drew Thorstensen
Joel Belog
Brent Tang
VLM
35
5
0
07 Jul 2024
LCS: A Language Converter Strategy for Zero-Shot Neural Machine
  Translation
LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation
Zengkui Sun
Yijin Liu
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
38
2
0
05 Jun 2024
Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective
  Scaffold Token Removal
Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal
Haoran Lian
Yizhe Xiong
Jianwei Niu
Shasha Mo
Zhenpeng Su
Zijia Lin
Peng Liu
Hui Chen
Guiguang Ding
34
1
0
27 Apr 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for
  Continuous Visual Speech Recognition
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
David Gimeno-Gómez
Carlos David Martínez Hinarejos
22
1
0
20 Feb 2024
Enhancing Document-level Translation of Large Language Model via
  Translation Mixed-instructions
Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions
Yachao Li
Junhui Li
Jing Jiang
Min Zhang
22
8
0
16 Jan 2024
Spike No More: Stabilizing the Pre-training of Large Language Models
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
13
13
0
28 Dec 2023
Implicit Affordance Acquisition via Causal Action-Effect Modeling in the
  Video Domain
Implicit Affordance Acquisition via Causal Action-Effect Modeling in the Video Domain
Hsiu-yu Yang
Carina Silberer
19
1
0
18 Dec 2023
Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI
  Grand Challenges
Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI Grand Challenges
Eren Kurshan
16
0
0
23 Oct 2023
SpEL: Structured Prediction for Entity Linking
SpEL: Structured Prediction for Entity Linking
Hassan S. Shavarani
Anoop Sarkar
17
9
0
23 Oct 2023
Sparse Universal Transformer
Sparse Universal Transformer
Shawn Tan
Yikang Shen
Zhenfang Chen
Aaron Courville
Chuang Gan
MoE
32
13
0
11 Oct 2023
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
  Accelerated Seq2Seq Diffusion Models
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models
Shansan Gong
Mukai Li
Jiangtao Feng
Zhiyong Wu
Lingpeng Kong
31
19
0
09 Oct 2023
One Wide Feedforward is All You Need
One Wide Feedforward is All You Need
Telmo Pires
António V. Lopes
Yannick Assogba
Hendra Setiawan
35
12
0
04 Sep 2023
Chat Translation Error Detection for Assisting Cross-lingual
  Communications
Chat Translation Error Detection for Assisting Cross-lingual Communications
Yunmeng Li
Jun Suzuki
Makoto Morishita
Kaori Abe
Ryoko Tokuhisa
Ana Brassard
Kentaro Inui
11
4
0
02 Aug 2023
Enhancing Supervised Learning with Contrastive Markings in Neural
  Machine Translation Training
Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation Training
Nathaniel Berger
M. Exel
Matthias Huck
Stefan Riezler
13
2
0
17 Jul 2023
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
  Modelling
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling
Longyue Wang
Zefeng Du
Donghua Liu
Cai Deng
Dian Yu
Haiyun Jiang
Yan Wang
Leyang Cui
Shuming Shi
Zhaopeng Tu
CoGe
47
6
0
16 Jul 2023
DropCompute: simple and more robust distributed synchronous training via
  compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
25
2
0
18 Jun 2023
Understanding Parameter Sharing in Transformers
Understanding Parameter Sharing in Transformers
Ye Lin
Mingxuan Wang
Zhexi Zhang
Xiaohui Wang
Tong Xiao
Jingbo Zhu
MoE
16
2
0
15 Jun 2023
Hyperbolic Convolution via Kernel Point Aggregation
Hyperbolic Convolution via Kernel Point Aggregation
Eric Qu
Dongmian Zou
45
3
0
15 Jun 2023
EM-Network: Oracle Guided Self-distillation for Sequence Learning
EM-Network: Oracle Guided Self-distillation for Sequence Learning
J. Yoon
Sunghwan Ahn
Hyeon Seung Lee
Minchan Kim
Seokhwan Kim
N. Kim
VLM
25
2
0
14 Jun 2023
When Vision Fails: Text Attacks Against ViT and OCR
When Vision Fails: Text Attacks Against ViT and OCR
Nicholas Boucher
Jenny Blessing
Ilia Shumailov
Ross J. Anderson
Nicolas Papernot
AAML
24
4
0
12 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
25
1
0
07 Jun 2023
When to Read Documents or QA History: On Unified and Selective
  Open-domain QA
When to Read Documents or QA History: On Unified and Selective Open-domain QA
Kyungjae Lee
Sanghyun Han
Seung-won Hwang
Moontae Lee
RALM
16
4
0
07 Jun 2023
Injecting knowledge into language generation: a case study in
  auto-charting after-visit care instructions from medical dialogue
Injecting knowledge into language generation: a case study in auto-charting after-visit care instructions from medical dialogue
M. Eremeev
Ilya Valmianski
X. Amatriain
Anitha Kannan
40
5
0
06 Jun 2023
TranSFormer: Slow-Fast Transformer for Machine Translation
TranSFormer: Slow-Fast Transformer for Machine Translation
Bei Li
Yi Jing
Xu Tan
Zhen Xing
Tong Xiao
Jingbo Zhu
41
7
0
26 May 2023
Neural Machine Translation for Mathematical Formulae
Neural Machine Translation for Mathematical Formulae
Felix Petersen
M. Schubotz
André Greiner-Petter
Bela Gipp
15
7
0
25 May 2023
Revisiting Non-Autoregressive Translation at Scale
Revisiting Non-Autoregressive Translation at Scale
Zhihao Wang
Longyue Wang
Jinsong Su
Junfeng Yao
Zhaopeng Tu
22
3
0
25 May 2023
Towards Higher Pareto Frontier in Multilingual Machine Translation
Towards Higher Pareto Frontier in Multilingual Machine Translation
Yi-Chong Huang
Xiaocheng Feng
Xinwei Geng
Baohang Li
Bing Qin
33
9
0
25 May 2023
The Best of Both Worlds: Combining Human and Machine Translations for
  Multilingual Semantic Parsing with Active Learning
The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning
Zhuang Li
Lizhen Qu
Philip R. Cohen
Raj Tumuluri
Gholamreza Haffari
22
4
0
22 May 2023
Logit-Based Ensemble Distribution Distillation for Robust Autoregressive
  Sequence Uncertainties
Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties
Yassir Fathullah
Guoxuan Xia
Mark J. F. Gales
UQCV
22
2
0
17 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural
  Machine Translation
Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation
Songming Zhang
Yunlong Liang
Shuaibo Wang
Wenjuan Han
Jian Liu
Jinan Xu
Yufeng Chen
21
7
0
14 May 2023
Lightweight Convolution Transformer for Cross-patient Seizure Detection
  in Multi-channel EEG Signals
Lightweight Convolution Transformer for Cross-patient Seizure Detection in Multi-channel EEG Signals
S. Rukhsar
A. Tiwari
29
9
0
07 May 2023
Leveraging Synthetic Targets for Machine Translation
Leveraging Synthetic Targets for Machine Translation
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
26
2
0
07 May 2023
Learning Language-Specific Layers for Multilingual Machine Translation
Learning Language-Specific Layers for Multilingual Machine Translation
Telmo Pires
Robin M. Schmidt
Yi-Hsiu Liao
Stephan Peitz
36
16
0
04 May 2023
Backdoor Learning on Sequence to Sequence Models
Backdoor Learning on Sequence to Sequence Models
Lichang Chen
Minhao Cheng
Heng-Chiao Huang
SILM
52
18
0
03 May 2023
Efficient Attention via Control Variates
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
28
18
0
09 Feb 2023
Dynamic Scheduled Sampling with Imitation Loss for Neural Text
  Generation
Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation
Xiang Lin
Prathyusha Jwalapuram
Shafiq R. Joty
DiffM
10
0
0
31 Jan 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
22
31
0
27 Jan 2023
Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics
  Without the Reference
Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference
Vilém Zouhar
S. Dhuliawala
Wangchunshu Zhou
Nico Daheim
Tom Kocmi
Yuchen Eleanor Jiang
Mrinmaya Sachan
16
9
0
21 Jan 2023
Transformers in Action Recognition: A Review on Temporal Modeling
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
21
8
0
29 Dec 2022
12345678
Next