ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.07561
  4. Cited By
Training Deeper Neural Machine Translation Models with Transparent
  Attention
v1v2 (latest)

Training Deeper Neural Machine Translation Models with Transparent Attention

22 August 2018
Ankur Bapna
Mengzhao Chen
Orhan Firat
Yuan Cao
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Training Deeper Neural Machine Translation Models with Transparent Attention"

50 / 63 papers shown
Title
Utilizing Multilingual Encoders to Improve Large Language Models for Low-Resource Languages
Utilizing Multilingual Encoders to Improve Large Language Models for Low-Resource LanguagesMoratuwa Engineering Research Conference (MERCon), 2025
Imalsha Puranegedara
Themira Chathumina
Nisal Ranathunga
Nisansa de Silva
Surangika Ranathunga
Mokanarangan Thayaparan
123
0
0
12 Aug 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
328
20
0
09 Feb 2025
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LNInternational Conference on Learning Representations (ICLR), 2024
Pengxiang Li
Lu Yin
Shiwei Liu
228
22
0
18 Dec 2024
PartialFormer: Modeling Part Instead of Whole for Machine Translation
PartialFormer: Modeling Part Instead of Whole for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoEAI4CE
190
1
0
23 Oct 2023
Ask Language Model to Clean Your Noisy Translation Data
Ask Language Model to Clean Your Noisy Translation Data
Quinten Bolding
Baohao Liao
Brandon James Denis
Jun Luo
Christof Monz
158
8
0
20 Oct 2023
Prompt Guided Copy Mechanism for Conversational Question Answering
Prompt Guided Copy Mechanism for Conversational Question AnsweringInterspeech (Interspeech), 2023
Yong Zhang
Zhitao Li
Jianzong Wang
Yiming Gao
Ning Cheng
Fengying Yu
Jing Xiao
146
1
0
07 Aug 2023
Layer-wise Representation Fusion for Compositional Generalization
Layer-wise Representation Fusion for Compositional GeneralizationAAAI Conference on Artificial Intelligence (AAAI), 2023
Yafang Zheng
Lei Lin
Shantao Liu
Binling Wang
Zhaohong Lai
Wenhao Rao
Biao Fu
Yidong Chen
Xiaodon Shi
AI4CE
321
4
0
20 Jul 2023
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Wenjie Hao
Hongfei Xu
Lingling Mu
Hongying Zan
MoE
249
4
0
24 Dec 2022
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient
  Classification
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient ClassificationInternational Conference on Language Resources and Evaluation (LREC), 2022
Muhammad N. ElNokrashy
Badr AlKhamissi
Mona T. Diab
MoMe
192
7
0
30 Sep 2022
GTrans: Grouping and Fusing Transformer Layers for Neural Machine
  Translation
GTrans: Grouping and Fusing Transformer Layers for Neural Machine TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Jian Yang
Yuwei Yin
Liqun Yang
Shuming Ma
Haoyang Huang
Dongdong Zhang
Furu Wei
Zhoujun Li
AI4CE
170
22
0
29 Jul 2022
BridgeTower: Building Bridges Between Encoders in Vision-Language
  Representation Learning
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation LearningAAAI Conference on Artificial Intelligence (AAAI), 2022
Xiao Xu
Chenfei Wu
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
156
90
0
17 Jun 2022
B2T Connection: Serving Stability and Performance in Deep Transformers
B2T Connection: Serving Stability and Performance in Deep TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
254
15
0
01 Jun 2022
Transformers in Time-series Analysis: A Tutorial
Transformers in Time-series Analysis: A TutorialCircuits, systems, and signal processing (CSSP), 2022
Sabeen Ahmed
Ian E. Nielsen
Aakash Tripathi
Shamoon Siddiqui
Ghulam Rasool
R. Ramachandran
AI4TS
239
236
0
28 Apr 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and
  Applications
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
222
131
0
25 Apr 2022
ODE Transformer: An Ordinary Differential Equation-Inspired Model for
  Sequence Generation
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Bei Li
Quan Du
Tao Zhou
Yi Jing
Shuhan Zhou
Xin Zeng
Tong Xiao
JingBo Zhu
Xuebo Liu
Min Zhang
149
39
0
17 Mar 2022
Grounding Commands for Autonomous Vehicles via Layer Fusion with
  Region-specific Dynamic Layer Attention
Grounding Commands for Autonomous Vehicles via Layer Fusion with Region-specific Dynamic Layer AttentionIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Hou Pong Chan
M. Guo
Chengguang Xu
167
7
0
14 Mar 2022
An Empirical Study of Training End-to-End Vision-and-Language
  Transformers
An Empirical Study of Training End-to-End Vision-and-Language TransformersComputer Vision and Pattern Recognition (CVPR), 2021
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
221
425
0
03 Nov 2021
Why don't people use character-level machine translation?
Why don't people use character-level machine translation?
Jindrich Libovický
Helmut Schmid
Kangyang Luo
259
34
0
15 Oct 2021
Recurrent multiple shared layers in Depth for Neural Machine Translation
Recurrent multiple shared layers in Depth for Neural Machine Translation
Guoliang Li
Yiyang Li
MoE
90
2
0
23 Aug 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
233
315
0
01 Jul 2021
Recurrent Stacking of Layers in Neural Networks: An Application to
  Neural Machine Translation
Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation
Mary Dabre
Atsushi Fujita
78
1
0
18 Jun 2021
A Survey of Transformers
A Survey of TransformersAI Open (AO), 2021
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
349
1,344
0
08 Jun 2021
ODE Transformer: An Ordinary Differential Equation-Inspired Model for
  Neural Machine Translation
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation
Bei Li
Quan Du
Tao Zhou
Shuhan Zhou
Xin Zeng
Tong Xiao
Jingbo Zhu
144
23
0
06 Apr 2021
OmniNet: Omnidirectional Representations from Transformers
OmniNet: Omnidirectional Representations from TransformersInternational Conference on Machine Learning (ICML), 2021
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
214
32
0
01 Mar 2021
Do Transformer Modifications Transfer Across Implementations and
  Applications?
Do Transformer Modifications Transfer Across Implementations and Applications?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
190
134
0
23 Feb 2021
An Efficient Transformer Decoder with Compressed Sub-layers
An Efficient Transformer Decoder with Compressed Sub-layersAAAI Conference on Artificial Intelligence (AAAI), 2021
Yanyang Li
Ye Lin
Tong Xiao
Jingbo Zhu
219
32
0
03 Jan 2021
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence
  Learning
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence LearningInternational Conference on Learning Representations (ICLR), 2020
Xuebo Liu
Longyue Wang
Yang Li
Liang Ding
Lidia S. Chao
Zhaopeng Tu
AI4CE
151
37
0
29 Dec 2020
Learning Light-Weight Translation Models from Deep Transformer
Learning Light-Weight Translation Models from Deep TransformerAAAI Conference on Artificial Intelligence (AAAI), 2020
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
245
43
0
27 Dec 2020
Improving Gradient Flow with Unrolled Highway Expectation Maximization
Improving Gradient Flow with Unrolled Highway Expectation Maximization
C. Song
Eunseok Kim
Inwook Shim
64
2
0
09 Dec 2020
Layer-Wise Multi-View Learning for Neural Machine Translation
Layer-Wise Multi-View Learning for Neural Machine Translation
Qiang Wang
Changliang Li
Yue Zhang
Tong Xiao
Jingbo Zhu
66
4
0
03 Nov 2020
Multi-Unit Transformers for Neural Machine Translation
Multi-Unit Transformers for Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Jianhao Yan
Fandong Meng
Jie Zhou
193
18
0
21 Oct 2020
Shallow-to-Deep Training for Neural Machine Translation
Shallow-to-Deep Training for Neural Machine Translation
Bei Li
Ziyang Wang
Hui Liu
Yufan Jiang
Quan Du
Tong Xiao
Huizhen Wang
Jingbo Zhu
128
51
0
08 Oct 2020
Deep Transformers with Latent Depth
Deep Transformers with Latent DepthNeural Information Processing Systems (NeurIPS), 2020
Xian Li
Asa Cooper Stickland
Yuqing Tang
X. Kong
128
28
0
28 Sep 2020
Weight Distillation: Transferring the Knowledge in Neural Network
  Parameters
Weight Distillation: Transferring the Knowledge in Neural Network ParametersAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Ye Lin
Yanyang Li
Ziyang Wang
Bei Li
Quan Du
Tong Xiao
Jingbo Zhu
262
28
0
19 Sep 2020
Very Deep Transformers for Neural Machine Translation
Very Deep Transformers for Neural Machine Translation
Xiaodong Liu
Kevin Duh
Liyuan Liu
Jianfeng Gao
200
108
0
18 Aug 2020
Rewiring the Transformer with Depth-Wise LSTMs
Rewiring the Transformer with Depth-Wise LSTMsInternational Conference on Language Resources and Evaluation (LREC), 2020
Hongfei Xu
Yang Song
Qiuhui Liu
Josef van Genabith
Deyi Xiong
154
7
0
13 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
352
1,543
0
30 Jun 2020
Learning Source Phrase Representations for Neural Machine Translation
Learning Source Phrase Representations for Neural Machine Translation
Hongfei Xu
Josef van Genabith
Deyi Xiong
Qiuhui Liu
Jingyi Zhang
75
21
0
25 Jun 2020
The Lipschitz Constant of Self-Attention
The Lipschitz Constant of Self-Attention
Hyunjik Kim
George Papamakarios
A. Mnih
215
175
0
08 Jun 2020
Norm-Based Curriculum Learning for Neural Machine Translation
Norm-Based Curriculum Learning for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Xuebo Liu
Houtim Lai
Yang Li
Lidia S. Chao
151
124
0
03 Jun 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language
  Processing
HAT: Hardware-Aware Transformers for Efficient Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
231
275
0
28 May 2020
Rethinking and Improving Natural Language Generation with Layer-Wise
  Multi-View Decoding
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
Fenglin Liu
Xuancheng Ren
Guangxiang Zhao
Chenyu You
Xuewei Ma
Xian Wu
Xu Sun
242
2
0
16 May 2020
Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several
  Thinner Ones?
Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?
Nadezhda Chirkova
E. Lobacheva
Dmitry Vetrov
OODMoE
85
9
0
14 May 2020
Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning
  Subword Systems
Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems
Jindrich Libovický
Kangyang Luo
97
0
0
29 Apr 2020
Multiscale Collaborative Deep Models for Neural Machine Translation
Multiscale Collaborative Deep Models for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Xiangpeng Wei
Heng Yu
Yue Hu
Yue Zhang
Rongxiang Weng
Weihua Luo
183
29
0
29 Apr 2020
Marathi To English Neural Machine Translation With Near Perfect Corpus
  And Transformers
Marathi To English Neural Machine Translation With Near Perfect Corpus And Transformers
S. Jadhav
79
8
0
26 Feb 2020
A Survey of Deep Learning Techniques for Neural Machine Translation
A Survey of Deep Learning Techniques for Neural Machine Translation
Shu Yang
Yuxin Wang
Xiaowen Chu
VLMAI4TSAI4CE
275
147
0
18 Feb 2020
Neuron Interaction Based Representation Composition for Neural Machine
  Translation
Neuron Interaction Based Representation Composition for Neural Machine TranslationAAAI Conference on Artificial Intelligence (AAAI), 2019
Jian Li
Xing Wang
Baosong Yang
Shuming Shi
Michael R. Lyu
Zhaopeng Tu
121
18
0
22 Nov 2019
Character-based NMT with Transformer
Character-based NMT with Transformer
Rohit Gupta
Laurent Besacier
Marc Dymetman
Matthias Gallé
166
24
0
12 Nov 2019
Lipschitz Constrained Parameter Initialization for Deep Transformers
Lipschitz Constrained Parameter Initialization for Deep TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Hongfei Xu
Qiuhui Liu
Josef van Genabith
Deyi Xiong
Jingyi Zhang
ODL
230
26
0
08 Nov 2019
12
Next