ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.07416
  4. Cited By
Tensor2Tensor for Neural Machine Translation

Tensor2Tensor for Neural Machine Translation

16 March 2018
Ashish Vaswani
Samy Bengio
E. Brevdo
François Chollet
Aidan Gomez
Stephan Gouws
Llion Jones
Lukasz Kaiser
Nal Kalchbrenner
Niki Parmar
Ryan Sepassi
Noam M. Shazeer
Jakob Uszkoreit
ArXiv (abs)PDFHTML

Papers citing "Tensor2Tensor for Neural Machine Translation"

50 / 264 papers shown
CrossedWires: A Dataset of Syntactically Equivalent but Semantically
  Disparate Deep Learning Models
CrossedWires: A Dataset of Syntactically Equivalent but Semantically Disparate Deep Learning Models
Max Zvyagin
Thomas Brettin
Arvind Ramanathan
Sumit Kumar Jha
109
1
0
29 Aug 2021
YANMTT: Yet Another Neural Machine Translation Toolkit
YANMTT: Yet Another Neural Machine Translation ToolkitAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Mary Dabre
Eiichiro Sumita
221
14
0
25 Aug 2021
Compositional Generalization in Multilingual Semantic Parsing over
  Wikidata
Compositional Generalization in Multilingual Semantic Parsing over WikidataTransactions of the Association for Computational Linguistics (TACL), 2021
Ruixiang Cui
Rahul Aralikatte
Heather Lent
Daniel Hershcovich
242
15
0
07 Aug 2021
Residual Tree Aggregation of Layers for Neural Machine Translation
Residual Tree Aggregation of Layers for Neural Machine Translation
Guoliang Li
Yiyang Li
113
0
0
19 Jul 2021
Neural Machine Translation for Low-Resource Languages: A Survey
Neural Machine Translation for Low-Resource Languages: A SurveyACM Computing Surveys (CSUR), 2021
Surangika Ranathunga
E. Lee
Marjana Prifti Skenduli
Ravi Shekhar
Mehreen Alam
Rishemjit Kaur
321
324
0
29 Jun 2021
A Survey of Transformers
A Survey of TransformersAI Open (AO), 2021
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
445
1,386
0
08 Jun 2021
Luna: Linear Unified Nested Attention
Luna: Linear Unified Nested AttentionNeural Information Processing Systems (NeurIPS), 2021
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
235
133
0
03 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel
  Machines
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
226
24
0
02 Jun 2021
Synthetic Data Generation for Grammatical Error Correction with Tagged
  Corruption Models
Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption ModelsWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2021
Felix Stahlberg
Shankar Kumar
SyDa
220
103
0
27 May 2021
TranSmart: A Practical Interactive Machine Translation System
TranSmart: A Practical Interactive Machine Translation System
Guoping Huang
Lemao Liu
Xing Wang
Longyue Wang
Huayang Li
Zhaopeng Tu
Chengyang Huang
Shuming Shi
171
36
0
27 May 2021
Rethinking Skip Connection with Layer Normalization in Transformers and
  ResNets
Rethinking Skip Connection with Layer Normalization in Transformers and ResNetsInternational Conference on Computational Linguistics (COLING), 2020
Fenglin Liu
Xuancheng Ren
Zhiyuan Zhang
Xu Sun
Yuexian Zou
AI4CE
148
80
0
15 May 2021
Spelling Correction with Denoising Transformer
Spelling Correction with Denoising Transformer
Alexandr Kuznetsov
Hector Urdiales
123
19
0
12 May 2021
Hierarchical RNNs-Based Transformers MADDPG for Mixed
  Cooperative-Competitive Environments
Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive EnvironmentsJournal of Intelligent & Fuzzy Systems (JIFS), 2021
Xiaolong Wei
Lifang Yang
Xianglin Huang
Gang Cao
Zhulin Tao
Zhengyang Du
Jing An
192
7
0
11 May 2021
EL-Attention: Memory Efficient Lossless Attention for Generation
EL-Attention: Memory Efficient Lossless Attention for GenerationInternational Conference on Machine Learning (ICML), 2021
Yu Yan
Jiusheng Chen
Weizhen Qi
Nikhil Bhendawade
Yeyun Gong
Nan Duan
Ruofei Zhang
VLM
166
9
0
11 May 2021
Billion-scale Pre-trained E-commerce Product Knowledge Graph Model
Billion-scale Pre-trained E-commerce Product Knowledge Graph ModelIEEE International Conference on Data Engineering (ICDE), 2021
Wen Zhang
Chi-Man Wong
Ganqiang Ye
Bo Wen
Wei Zhang
Huajun Chen
207
25
0
02 May 2021
A Simple and Effective Positional Encoding for Transformers
A Simple and Effective Positional Encoding for TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Pu-Chin Chen
Henry Tsai
Srinadh Bhojanapalli
Hyung Won Chung
Yin-Wen Chang
Chun-Sung Ferng
252
75
0
18 Apr 2021
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep
  Learning
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep LearningIEEE International Conference on Distributed Computing Systems (ICDCS), 2021
Shijian Li
Oren Mangoubi
Lijie Xu
Tian Guo
230
22
0
16 Apr 2021
Counter-Interference Adapter for Multilingual Machine Translation
Counter-Interference Adapter for Multilingual Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Yaoming Zhu
Jiangtao Feng
Chengqi Zhao
Mingxuan Wang
Lei Li
275
70
0
16 Apr 2021
First the worst: Finding better gender translations during beam search
First the worst: Finding better gender translations during beam searchFindings (Findings), 2021
D. Saunders
Rosie Sallis
Bill Byrne
194
33
0
15 Apr 2021
WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding
  Universal Value" Documents with Soft Labels
WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding Universal Value" Documents with Soft LabelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Nan Bai
Renqian Luo
Pirouz Nourian
A. Roders
141
7
0
12 Apr 2021
Extended Parallel Corpus for Amharic-English Machine Translation
Extended Parallel Corpus for Amharic-English Machine TranslationInternational Conference on Language Resources and Evaluation (LREC), 2021
A. Gezmu
A. Nürnberger
T. Bati
254
19
0
08 Apr 2021
Sample size estimation for comparing dynamic treatment regimens in a
  SMART: a Monte Carlo-based approach and case study with longitudinal
  overdispersed count outcomes
Sample size estimation for comparing dynamic treatment regimens in a SMART: a Monte Carlo-based approach and case study with longitudinal overdispersed count outcomesStatistical Methods in Medical Research (Stat Med), 2021
Jamie Yap
John J. Dziak
David Kabiito
Claire Babirye
J. McKay
Bibhas Chakraborty
J. Nakatumba‐Nabende
194
26
0
31 Mar 2021
FastMoE: A Fast Mixture-of-Expert Training System
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALMMoE
202
129
0
24 Mar 2021
Full Page Handwriting Recognition via Image to Sequence Extraction
Full Page Handwriting Recognition via Image to Sequence ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2021
Sumeet S. Singh
Sergey Karayev
256
65
0
11 Mar 2021
Hurdles to Progress in Long-form Question Answering
Hurdles to Progress in Long-form Question AnsweringNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Kalpesh Krishna
Aurko Roy
Mohit Iyyer
238
222
0
10 Mar 2021
Do Transformer Modifications Transfer Across Implementations and
  Applications?
Do Transformer Modifications Transfer Across Implementations and Applications?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
215
134
0
23 Feb 2021
VisuoSpatial Foresight for Physical Sequential Fabric Manipulation
VisuoSpatial Foresight for Physical Sequential Fabric ManipulationAutonomous Robots (Auton. Robots), 2021
Ryan Hoque
Daniel Seita
Ashwin Balakrishna
Aditya Ganapathi
A. Tanwani
Nawid Jamali
K. Yamane
Soshi Iba
Ken Goldberg
148
43
0
19 Feb 2021
A Deep Adversarial Model for Suffix and Remaining Time Prediction of
  Event Sequences
A Deep Adversarial Model for Suffix and Remaining Time Prediction of Event SequencesSDM (SDM), 2021
Farbod Taymouri
M. Rosa
S. Erfani
128
33
0
15 Feb 2021
MUFASA: Multimodal Fusion Architecture Search for Electronic Health
  Records
MUFASA: Multimodal Fusion Architecture Search for Electronic Health RecordsAAAI Conference on Artificial Intelligence (AAAI), 2021
Zhen Xu
David R. So
Andrew M. Dai
Mamba
342
66
0
03 Feb 2021
Automated Query Reformulation for Efficient Search based on Query Logs
  From Stack Overflow
Automated Query Reformulation for Efficient Search based on Query Logs From Stack OverflowInternational Conference on Software Engineering (ICSE), 2021
Kaibo Cao
Chunyang Chen
Sebastian Baltes
Christoph Treude
Xiang Chen
232
65
0
01 Feb 2021
TextBox: A Unified, Modularized, and Extensible Framework for Text
  Generation
TextBox: A Unified, Modularized, and Extensible Framework for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Junyi Li
Tianyi Tang
Gaole He
Jinhao Jiang
Xiaoxuan Hu
Puzhao Xie
Zhipeng Chen
Zhuohao Yu
Wayne Xin Zhao
Ji-Rong Wen
306
27
0
06 Jan 2021
Neural Machine Translation: A Review of Methods, Resources, and Tools
Neural Machine Translation: A Review of Methods, Resources, and ToolsAI Open (AO), 2020
Zhixing Tan
Shuo Wang
Zonghan Yang
Gang Chen
Xuancheng Huang
Maosong Sun
Yang Liu
3DVAI4TS
259
124
0
31 Dec 2020
Why Neural Machine Translation Prefers Empty Outputs
Why Neural Machine Translation Prefers Empty Outputs
Xing Shi
Yijun Xiao
Kevin Knight
AAML
131
9
0
24 Dec 2020
*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional
  Task
*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional TaskAAAI Conference on Artificial Intelligence (AAAI), 2020
Dmitry Tsarkov
Tibor Tihon
Nathan Scales
Nikola Momchev
Danila Sinopalnikov
Nathanael Scharli
171
17
0
15 Dec 2020
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision
  and Language Research in Turkish
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in TurkishMachine Translation (MT), 2020
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
206
9
0
13 Dec 2020
Attentional-Biased Stochastic Gradient Descent
Attentional-Biased Stochastic Gradient Descent
Q. Qi
Yi Tian Xu
Rong Jin
W. Yin
Tianbao Yang
ODL
459
13
0
13 Dec 2020
Cross-lingual Transfer of Abstractive Summarizer to Less-resource
  Language
Cross-lingual Transfer of Abstractive Summarizer to Less-resource Language
Aleš Žagar
Marko Robnik-Šikonja
240
11
0
08 Dec 2020
ConVEx: Data-Efficient and Few-Shot Slot Labeling
ConVEx: Data-Efficient and Few-Shot Slot Labeling
Matthew Henderson
Ivan Vulić
CLIPVLM
209
39
0
22 Oct 2020
CUNI Systems for the Unsupervised and Very Low Resource Translation Task
  in WMT20
CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20
Ivana Kvapilíková
Tom Kocmi
Ondrej Bojar
95
5
0
22 Oct 2020
Detecting ESG topics using domain-specific language models and data
  augmentation approaches
Detecting ESG topics using domain-specific language models and data augmentation approaches
Timothy Nugent
N. Stelea
Jochen L. Leidner
164
13
0
16 Oct 2020
Semantic Label Smoothing for Sequence to Sequence Problems
Semantic Label Smoothing for Sequence to Sequence Problems
Michal Lukasik
Himanshu Jain
A. Menon
Seungyeon Kim
Srinadh Bhojanapalli
Felix X. Yu
Sanjiv Kumar
AI4TS
125
18
0
15 Oct 2020
Addressing Exposure Bias With Document Minimum Risk Training: Cambridge
  at the WMT20 Biomedical Translation Task
Addressing Exposure Bias With Document Minimum Risk Training: Cambridge at the WMT20 Biomedical Translation TaskConference on Machine Translation (WMT), 2020
Danielle Saunders
Bill Byrne
156
10
0
11 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLMLRM
325
318
0
11 Oct 2020
On Task-Level Dialogue Composition of Generative Transformer Model
On Task-Level Dialogue Composition of Generative Transformer ModelFirst Workshop on Insights from Negative Results in NLP (IFNRN), 2020
Prasanna Parthasarathi
Arvind Neelakantan
Sharan Narang
113
2
0
09 Oct 2020
Query-Key Normalization for Transformers
Query-Key Normalization for TransformersFindings (Findings), 2020
Alex Henry
Prudhvi Raj Dachapally
S. Pawar
Yuxuan Chen
228
153
0
08 Oct 2020
Improving Sequential Latent Variable Models with Autoregressive Flows
Improving Sequential Latent Variable Models with Autoregressive Flows
Joseph Marino
Lei Chen
Jiawei He
Stephan Mandt
BDLAI4TS
336
15
0
07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
133
37
0
06 Oct 2020
Code to Comment "Translation": Data, Metrics, Baselining & Evaluation
Code to Comment "Translation": Data, Metrics, Baselining & EvaluationInternational Conference on Automated Software Engineering (ASE), 2020
David Gros
Hariharan Sezhiyan
Prem Devanbu
Zhou Yu
170
82
0
03 Oct 2020
Expectigrad: Fast Stochastic Optimization with Robust Convergence
  Properties
Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties
Brett Daley
Chris Amato
ODL
138
4
0
03 Oct 2020
Seq2Edits: Sequence Transduction Using Span-level Edit Operations
Seq2Edits: Sequence Transduction Using Span-level Edit OperationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Felix Stahlberg
Shankar Kumar
BDL
196
95
0
23 Sep 2020
Previous
123456
Next