ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.07416
  4. Cited By
Tensor2Tensor for Neural Machine Translation

Tensor2Tensor for Neural Machine Translation

16 March 2018
Ashish Vaswani
Samy Bengio
E. Brevdo
François Chollet
Aidan N. Gomez
Stephan Gouws
Llion Jones
Lukasz Kaiser
Nal Kalchbrenner
Niki Parmar
Ryan Sepassi
Noam M. Shazeer
Jakob Uszkoreit
ArXivPDFHTML

Papers citing "Tensor2Tensor for Neural Machine Translation"

50 / 261 papers shown
Title
Neural Machine Translation for Low-Resource Languages: A Survey
Neural Machine Translation for Low-Resource Languages: A Survey
Surangika Ranathunga
E. Lee
Marjana Prifti Skenduli
Ravi Shekhar
Mehreen Alam
Rishemjit Kaur
27
235
0
29 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,087
0
08 Jun 2021
Luna: Linear Unified Nested Attention
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
25
114
0
03 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel
  Machines
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
23
20
0
02 Jun 2021
Synthetic Data Generation for Grammatical Error Correction with Tagged
  Corruption Models
Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models
Felix Stahlberg
Shankar Kumar
SyDa
11
94
0
27 May 2021
TranSmart: A Practical Interactive Machine Translation System
TranSmart: A Practical Interactive Machine Translation System
Guoping Huang
Lemao Liu
Xing Wang
Longyue Wang
Huayang Li
Zhaopeng Tu
Chengyang Huang
Shuming Shi
16
32
0
27 May 2021
Rethinking Skip Connection with Layer Normalization in Transformers and
  ResNets
Rethinking Skip Connection with Layer Normalization in Transformers and ResNets
Fenglin Liu
Xuancheng Ren
Zhiyuan Zhang
Xu Sun
Yuexian Zou
AI4CE
14
67
0
15 May 2021
Spelling Correction with Denoising Transformer
Spelling Correction with Denoising Transformer
Alexandr Kuznetsov
Hector Urdiales
16
18
0
12 May 2021
Hierarchical RNNs-Based Transformers MADDPG for Mixed
  Cooperative-Competitive Environments
Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments
Xiaolong Wei
Lifang Yang
Xianglin Huang
Gang Cao
Zhulin Tao
Zhengyang Du
Jing An
21
6
0
11 May 2021
EL-Attention: Memory Efficient Lossless Attention for Generation
EL-Attention: Memory Efficient Lossless Attention for Generation
Yu Yan
Jiusheng Chen
Weizhen Qi
Nikhil Bhendawade
Yeyun Gong
Nan Duan
Ruofei Zhang
VLM
18
6
0
11 May 2021
Billion-scale Pre-trained E-commerce Product Knowledge Graph Model
Billion-scale Pre-trained E-commerce Product Knowledge Graph Model
Wen Zhang
Chi-Man Wong
Ganqiang Ye
Bo Wen
Wei Zhang
Huajun Chen
6
21
0
02 May 2021
A Simple and Effective Positional Encoding for Transformers
A Simple and Effective Positional Encoding for Transformers
Pu-Chin Chen
Henry Tsai
Srinadh Bhojanapalli
Hyung Won Chung
Yin-Wen Chang
Chun-Sung Ferng
59
62
0
18 Apr 2021
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep
  Learning
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning
Shijian Li
Oren Mangoubi
Lijie Xu
Tian Guo
24
15
0
16 Apr 2021
Counter-Interference Adapter for Multilingual Machine Translation
Counter-Interference Adapter for Multilingual Machine Translation
Yaoming Zhu
Jiangtao Feng
Chengqi Zhao
Mingxuan Wang
Lei Li
17
57
0
16 Apr 2021
First the worst: Finding better gender translations during beam search
First the worst: Finding better gender translations during beam search
D. Saunders
Rosie Sallis
Bill Byrne
11
27
0
15 Apr 2021
WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding
  Universal Value" Documents with Soft Labels
WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding Universal Value" Documents with Soft Labels
Nan Bai
Renqian Luo
Pirouz Nourian
A. Roders
21
6
0
12 Apr 2021
Extended Parallel Corpus for Amharic-English Machine Translation
Extended Parallel Corpus for Amharic-English Machine Translation
A. Gezmu
A. Nürnberger
T. Bati
6
16
0
08 Apr 2021
Sample size estimation for comparing dynamic treatment regimens in a
  SMART: a Monte Carlo-based approach and case study with longitudinal
  overdispersed count outcomes
Sample size estimation for comparing dynamic treatment regimens in a SMART: a Monte Carlo-based approach and case study with longitudinal overdispersed count outcomes
Jamie Yap
John J. Dziak
David Kabiito
Claire Babirye
J. McKay
Bibhas Chakraborty
J. Nakatumba‐Nabende
11
0
0
31 Mar 2021
FastMoE: A Fast Mixture-of-Expert Training System
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALM
MoE
22
94
0
24 Mar 2021
Full Page Handwriting Recognition via Image to Sequence Extraction
Full Page Handwriting Recognition via Image to Sequence Extraction
Sumeet S. Singh
Sergey Karayev
19
53
0
11 Mar 2021
Hurdles to Progress in Long-form Question Answering
Hurdles to Progress in Long-form Question Answering
Kalpesh Krishna
Aurko Roy
Mohit Iyyer
20
192
0
10 Mar 2021
Do Transformer Modifications Transfer Across Implementations and
  Applications?
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
25
126
0
23 Feb 2021
VisuoSpatial Foresight for Physical Sequential Fabric Manipulation
VisuoSpatial Foresight for Physical Sequential Fabric Manipulation
Ryan Hoque
Daniel Seita
Ashwin Balakrishna
Aditya Ganapathi
A. Tanwani
Nawid Jamali
K. Yamane
Soshi Iba
Ken Goldberg
12
36
0
19 Feb 2021
A Deep Adversarial Model for Suffix and Remaining Time Prediction of
  Event Sequences
A Deep Adversarial Model for Suffix and Remaining Time Prediction of Event Sequences
Farbod Taymouri
M. Rosa
S. Erfani
19
25
0
15 Feb 2021
MUFASA: Multimodal Fusion Architecture Search for Electronic Health
  Records
MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records
Zhen Xu
David R. So
Andrew M. Dai
Mamba
58
51
0
03 Feb 2021
Automated Query Reformulation for Efficient Search based on Query Logs
  From Stack Overflow
Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow
Kaibo Cao
Chunyang Chen
Sebastian Baltes
Christoph Treude
Xiang Chen
22
60
0
01 Feb 2021
TextBox: A Unified, Modularized, and Extensible Framework for Text
  Generation
TextBox: A Unified, Modularized, and Extensible Framework for Text Generation
Junyi Li
Tianyi Tang
Gaole He
Jinhao Jiang
Xiaoxuan Hu
Puzhao Xie
Zhipeng Chen
Zhuohao Yu
Wayne Xin Zhao
Ji-Rong Wen
37
25
0
06 Jan 2021
Neural Machine Translation: A Review of Methods, Resources, and Tools
Neural Machine Translation: A Review of Methods, Resources, and Tools
Zhixing Tan
Shuo Wang
Zonghan Yang
Gang Chen
Xuancheng Huang
Maosong Sun
Yang Liu
3DV
AI4TS
15
105
0
31 Dec 2020
Why Neural Machine Translation Prefers Empty Outputs
Why Neural Machine Translation Prefers Empty Outputs
Xing Shi
Yijun Xiao
Kevin Knight
AAML
17
9
0
24 Dec 2020
*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional
  Task
*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional Task
Dmitry Tsarkov
Tibor Tihon
Nathan Scales
Nikola Momchev
Danila Sinopalnikov
Nathanael Scharli
16
17
0
15 Dec 2020
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision
  and Language Research in Turkish
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
23
8
0
13 Dec 2020
Attentional-Biased Stochastic Gradient Descent
Attentional-Biased Stochastic Gradient Descent
Q. Qi
Yi Tian Xu
R. L. Jin
W. Yin
Tianbao Yang
ODL
26
12
0
13 Dec 2020
Cross-lingual Transfer of Abstractive Summarizer to Less-resource
  Language
Cross-lingual Transfer of Abstractive Summarizer to Less-resource Language
Aleš Žagar
Marko Robnik-Šikonja
25
9
0
08 Dec 2020
ConVEx: Data-Efficient and Few-Shot Slot Labeling
ConVEx: Data-Efficient and Few-Shot Slot Labeling
Matthew Henderson
Ivan Vulić
CLIP
VLM
20
38
0
22 Oct 2020
CUNI Systems for the Unsupervised and Very Low Resource Translation Task
  in WMT20
CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20
Ivana Kvapilíková
Tom Kocmi
Ondrej Bojar
12
5
0
22 Oct 2020
Detecting ESG topics using domain-specific language models and data
  augmentation approaches
Detecting ESG topics using domain-specific language models and data augmentation approaches
Timothy Nugent
N. Stelea
Jochen L. Leidner
20
13
0
16 Oct 2020
Semantic Label Smoothing for Sequence to Sequence Problems
Semantic Label Smoothing for Sequence to Sequence Problems
Michal Lukasik
Himanshu Jain
A. Menon
Seungyeon Kim
Srinadh Bhojanapalli
Felix X. Yu
Sanjiv Kumar
AI4TS
17
18
0
15 Oct 2020
Addressing Exposure Bias With Document Minimum Risk Training: Cambridge
  at the WMT20 Biomedical Translation Task
Addressing Exposure Bias With Document Minimum Risk Training: Cambridge at the WMT20 Biomedical Translation Task
Danielle Saunders
Bill Byrne
13
10
0
11 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLM
LRM
14
264
0
11 Oct 2020
On Task-Level Dialogue Composition of Generative Transformer Model
On Task-Level Dialogue Composition of Generative Transformer Model
Prasanna Parthasarathi
Arvind Neelakantan
Sharan Narang
12
2
0
09 Oct 2020
Query-Key Normalization for Transformers
Query-Key Normalization for Transformers
Alex Henry
Prudhvi Raj Dachapally
S. Pawar
Yuxuan Chen
17
75
0
08 Oct 2020
Improving Sequential Latent Variable Models with Autoregressive Flows
Improving Sequential Latent Variable Models with Autoregressive Flows
Joseph Marino
Lei Chen
Jiawei He
Stephan Mandt
BDL
AI4TS
30
12
0
07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
15
34
0
06 Oct 2020
Code to Comment "Translation": Data, Metrics, Baselining & Evaluation
Code to Comment "Translation": Data, Metrics, Baselining & Evaluation
David Gros
Hariharan Sezhiyan
Prem Devanbu
Zhou Yu
45
68
0
03 Oct 2020
Expectigrad: Fast Stochastic Optimization with Robust Convergence
  Properties
Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties
Brett Daley
Chris Amato
ODL
11
4
0
03 Oct 2020
Seq2Edits: Sequence Transduction Using Span-level Edit Operations
Seq2Edits: Sequence Transduction Using Span-level Edit Operations
Felix Stahlberg
Shankar Kumar
BDL
20
82
0
23 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
31
79
0
17 Sep 2020
A Computational-Graph Partitioning Method for Training
  Memory-Constrained DNNs
A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs
Fareed Qararyah
M. Wahib
Douga Dikbayir
M. E. Belviranli
D. Unat
17
8
0
19 Aug 2020
Adaptable Multi-Domain Language Model for Transformer ASR
Adaptable Multi-Domain Language Model for Transformer ASR
Taewoo Lee
Min-Joong Lee
Tae Gyoon Kang
Seokyeong Jung
Minseok Kwon
...
Ho-Gyeong Kim
Jiseung Jeong
Jihyun Lee
Hosik Lee
Y. S. Choi
6
17
0
14 Aug 2020
End-to-End Neural Transformer Based Spoken Language Understanding
End-to-End Neural Transformer Based Spoken Language Understanding
Martin H. Radfar
Athanasios Mouchtaris
Siegfried Kunzmann
39
61
0
12 Aug 2020
Previous
123456
Next