ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.07416
  4. Cited By
Tensor2Tensor for Neural Machine Translation

Tensor2Tensor for Neural Machine Translation

16 March 2018
Ashish Vaswani
Samy Bengio
E. Brevdo
François Chollet
Aidan Gomez
Stephan Gouws
Llion Jones
Lukasz Kaiser
Nal Kalchbrenner
Niki Parmar
Ryan Sepassi
Noam M. Shazeer
Jakob Uszkoreit
ArXiv (abs)PDFHTML

Papers citing "Tensor2Tensor for Neural Machine Translation"

50 / 264 papers shown
Title
Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation
Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation
François Ledoyen
Gaël Dias
Jeremie Pantin
Alexis Lechervy
Fabrice Maurel
Youssef Chahir
88
0
0
01 Oct 2025
Large Language Models for Summarizing Czech Historical Documents and Beyond
Large Language Models for Summarizing Czech Historical Documents and BeyondInternational Conference on Agents and Artificial Intelligence (ICAART), 2025
Václav Tran
Jakub Šmíd
J. Martínek
Ladislav Lenc
Pavel Král
112
1
0
14 Aug 2025
Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast
Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast
Ji Qi
Tam Thuc Do
Mingxiao Liu
Zhuoshi Pan
Yuzhe Li
Gene Cheung
H. Vicky Zhao
AI4TS
238
0
0
19 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
253
2
0
16 May 2025
Efficient Time Series Forecasting via Hyper-Complex Models and Frequency Aggregation
Efficient Time Series Forecasting via Hyper-Complex Models and Frequency Aggregation
Eyal Yakir
Dor Tsur
Haim Permuter
AI4TS
314
0
0
27 Feb 2025
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement LearningAsian Conference on Machine Learning (ACML), 2025
Siddharth Aravindan
Dixant Mittal
Wee Sun Lee
BDL
271
0
0
17 Jan 2025
Domain adapted machine translation: What does catastrophic forgetting
  forget and why?
Domain adapted machine translation: What does catastrophic forgetting forget and why?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Danielle Saunders
Steve DeNeefe
AI4CE
105
4
0
23 Dec 2024
Building Dialogue Understanding Models for Low-resource Language
  Indonesian from Scratch
Building Dialogue Understanding Models for Low-resource Language Indonesian from Scratch
Donglin Di
Weinan Zhang
Yue Zhang
Fanglin Wang
270
1
0
24 Oct 2024
A survey of neural-network-based methods utilising comparable data for
  finding translation equivalents
A survey of neural-network-based methods utilising comparable data for finding translation equivalents
Michaela Denisová
Pavel Rychlý
230
0
0
19 Oct 2024
Do We Trust What They Say or What They Do? A Multimodal User Embedding
  Provides Personalized Explanations
Do We Trust What They Say or What They Do? A Multimodal User Embedding Provides Personalized Explanations
Zhicheng Ren
Zhiping Xiao
Luke Huan
260
0
0
04 Sep 2024
DLP: towards active defense against backdoor attacks with decoupled
  learning process
DLP: towards active defense against backdoor attacks with decoupled learning process
Zonghao Ying
Bin Wu
AAML
270
12
0
18 Jun 2024
Separable Physics-Informed Neural Networks for the solution of
  elasticity problems
Separable Physics-Informed Neural Networks for the solution of elasticity problems
V. A. Es'kin
Danil V. Davydov
Julia V. Guréva
Alexey O. Malkhanov
Mikhail E. Smorkalov
PINNAI4CE
275
6
0
24 Jan 2024
Introducing Rhetorical Parallelism Detection: A New Task with Datasets,
  Metrics, and Baselines
Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and BaselinesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Stephen Lawrence Bothwell
Justin DeBenedetto
Theresa Crnkovich
Hildegund Müller
David Chiang
ObjD
279
3
0
30 Nov 2023
CodeBPE: Investigating Subtokenization Options for Large Language Model
  Pretraining on Source Code
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source CodeInternational Conference on Learning Representations (ICLR), 2023
Nadezhda Chirkova
Sergey Troshin
225
9
0
01 Aug 2023
3D Medical Image Segmentation based on multi-scale MPU-Net
3D Medical Image Segmentation based on multi-scale MPU-Net
Zeqiu Yu
Shuo Han
Ziheng Song
3DV
189
5
0
11 Jul 2023
Urania: Visualizing Data Analysis Pipelines for Natural Language-Based
  Data Exploration
Urania: Visualizing Data Analysis Pipelines for Natural Language-Based Data Exploration
Yi Guo
Nana Cao
Xiaoyu Qi
Haoyang Li
Danqing Shi
Jing Zhang
Qing Chen
Daniel Weiskopf
149
5
0
13 Jun 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research
  in Hausa Language
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa LanguageAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
Ibrahim Said Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
286
10
0
28 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive TransformersNeural Information Processing Systems (NeurIPS), 2023
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
358
70
0
25 May 2023
Exploring the Impact of Layer Normalization for Zero-shot Neural Machine
  Translation
Exploring the Impact of Layer Normalization for Zero-shot Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhuoyuan Mao
Mary Dabre
Qianying Liu
Israfel Salazar
Chenhui Chu
Sadao Kurohashi
120
7
0
16 May 2023
AttentionViz: A Global View of Transformer Attention
AttentionViz: A Global View of Transformer AttentionIEEE Transactions on Visualization and Computer Graphics (TVCG), 2023
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
281
87
0
04 May 2023
string2string: A Modern Python Library for String-to-String Algorithms
string2string: A Modern Python Library for String-to-String Algorithms
Mirac Suzgun
Stuart M. Shieber
Dan Jurafsky
148
10
0
27 Apr 2023
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
  Regularized Encoder-Decoder
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
Z. Fu
W. Lam
Qian Yu
Anthony Man-Cho So
Shengding Hu
Zhiyuan Liu
Nigel Collier
AuLLM
147
59
0
08 Apr 2023
Datamator: An Intelligent Authoring Tool for Creating Datamations via
  Data Query Decomposition
Datamator: An Intelligent Authoring Tool for Creating Datamations via Data Query Decomposition
Yi Guo
Nana Cao
Ligan Cai
Yanqiu Wu
Daniel Weiskopf
Danqing Shi
Qing Chen
194
2
0
06 Apr 2023
About optimal loss function for training physics-informed neural
  networks under respecting causality
About optimal loss function for training physics-informed neural networks under respecting causality
V. A. Es'kin
Danil V. Davydov
Ekaterina D. Egorova
Alexey O. Malkhanov
Mikhail A. Akhukov
Mikhail E. Smorkalov
PINN
233
8
0
05 Apr 2023
Synthetically generated text for supervised text analysis
Synthetically generated text for supervised text analysisPolitical Analysis (PA), 2023
Andrew Halterman
DeLMO
147
12
0
28 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
424
66
0
21 Mar 2023
Mutation-Based Adversarial Attacks on Neural Text Detectors
Mutation-Based Adversarial Attacks on Neural Text Detectors
G. Liang
Jesus Guerrero
I. Alsmadi
DeLMO
174
11
0
11 Feb 2023
ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine
  Learning Model for Detecting Short ChatGPT-generated Text
ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text
Sandra Mitrović
Davide Andreoletti
Omran Ayoub
DeLMO
156
181
0
30 Jan 2023
CUNI Systems for the WMT22 Czech-Ukrainian Translation Task
CUNI Systems for the WMT22 Czech-Ukrainian Translation TaskConference on Machine Translation (WMT), 2022
Martin Popel
Jindrich Libovický
Jindřich Helcl
132
6
0
01 Dec 2022
QNet: A Quantum-native Sequence Encoder Architecture
QNet: A Quantum-native Sequence Encoder ArchitectureInternational Conference on Quantum Computing and Engineering (ICQCE), 2022
Wei-Yen Day
Hao-Sheng Chen
Min Sun
231
1
0
31 Oct 2022
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image
  Sequences: From Feature Engineering to Attention-Based Neural Networks
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image Sequences: From Feature Engineering to Attention-Based Neural Networks
A. S. Bansal
Yoonjin Lee
Kyle Hilburn
I. Ebert‐Uphoff
AI4TS
284
2
0
22 Oct 2022
On the Explainability of Natural Language Processing Deep Models
On the Explainability of Natural Language Processing Deep ModelsACM Computing Surveys (ACM CSUR), 2022
Julia El Zini
M. Awad
232
109
0
13 Oct 2022
PARAGEN : A Parallel Generation Toolkit
PARAGEN : A Parallel Generation Toolkit
Jiangtao Feng
Yi Zhou
Jun Zhang
Xian Qian
Liwei Wu
Zhexi Zhang
Yanming Liu
Mingxuan Wang
Lei Li
Hao Zhou
VLM
181
3
0
07 Oct 2022
A Deep Investigation of RNN and Self-attention for the
  Cyrillic-Traditional Mongolian Bidirectional Conversion
A Deep Investigation of RNN and Self-attention for the Cyrillic-Traditional Mongolian Bidirectional ConversionInternational Conference on Neural Information Processing (ICONIP), 2022
Muhan Na
Rui Liu
Feilong
Guanglai Gao
121
0
0
24 Sep 2022
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep SetsInternational Conference on Machine Learning (ICML), 2022
Lily H. Zhang
Veronica Tozzo
J. Higgins
Rajesh Ranganath
BDLMoE
226
24
0
23 Jun 2022
B2T Connection: Serving Stability and Performance in Deep Transformers
B2T Connection: Serving Stability and Performance in Deep TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
302
15
0
01 Jun 2022
How to keep text private? A systematic review of deep learning methods
  for privacy-preserving natural language processing
How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processingArtificial Intelligence Review (Artif Intell Rev), 2022
Samuel Sousa
Roman Kern
PILMAILaw
190
58
0
20 May 2022
Optimizing Mixture of Experts using Dynamic Recompilations
Optimizing Mixture of Experts using Dynamic Recompilations
Ferdinand Kossmann
Zhihao Jia
A. Aiken
218
5
0
04 May 2022
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine
  Translation
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine TranslationInternational Conference on Language Resources and Evaluation (LREC), 2022
Idris Abdulmumin
S. Dash
Musa Abdullahi Dawud
Shantipriya Parida
Shamsuddeen Hassan Muhammad
Ibrahim Said Ahmad
Subhadarshi Panda
Ondrej Bojar
B. Galadanci
Bello Shehu Bello
263
21
0
02 May 2022
NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural
  Language Understanding in Task-Oriented Dialogue
NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue
I. Casanueva
Ivan Vulić
Georgios P. Spithourakis
Paweł Budzianowski
229
16
0
27 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It StopsInternational Conference on Language Resources and Evaluation (LREC), 2022
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
277
9
0
11 Apr 2022
Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$
Scaling Up Models and Data with t5x\texttt{t5x}t5x and seqio\texttt{seqio}seqioJournal of machine learning research (JMLR), 2022
Adam Roberts
Hyung Won Chung
Anselm Levskaya
Gaurav Mishra
James Bradbury
...
Brennan Saeta
Ryan Sepassi
A. Spiridonov
Joshua Newlan
Andrea Gesmundo
ALM
269
211
0
31 Mar 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
General-purpose, long-context autoregressive modeling with Perceiver ARInternational Conference on Machine Learning (ICML), 2022
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
224
75
0
15 Feb 2022
Capitalization and Punctuation Restoration: a Survey
Capitalization and Punctuation Restoration: a SurveyArtificial Intelligence Review (AIR), 2021
V. Pais
D. Tufis
202
21
0
21 Nov 2021
Benchmarking and scaling of deep learning models for land cover image
  classification
Benchmarking and scaling of deep learning models for land cover image classification
Ioannis Papoutsis
Nikolaos Ioannis Bountos
Angelos Zavras
Dimitrios Michail
Christos Tryfonopoulos
421
70
0
18 Nov 2021
Say What? Collaborative Pop Lyric Generation Using Multitask Transfer
  Learning
Say What? Collaborative Pop Lyric Generation Using Multitask Transfer LearningInternational Conference on Human-Agent Interaction (HAI), 2021
Naveen Ram
Tanay Gummadi
Rahul Bhethanabotla
Richard J. Savery
Gil Weinberg
159
9
0
15 Nov 2021
Leveraging redundancy in attention with Reuse Transformers
Leveraging redundancy in attention with Reuse Transformers
Srinadh Bhojanapalli
Ayan Chakrabarti
Andreas Veit
Michal Lukasik
Himanshu Jain
Frederick Liu
Yin-Wen Chang
Sanjiv Kumar
147
36
0
13 Oct 2021
The Low-Resource Double Bind: An Empirical Study of Pruning for
  Low-Resource Machine Translation
The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation
Orevaoghene Ahia
Julia Kreutzer
Sara Hooker
302
58
0
06 Oct 2021
Primer: Searching for Efficient Transformers for Language Modeling
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
395
182
0
17 Sep 2021
Miðeind's WMT 2021 submission
Miðeind's WMT 2021 submission
Haukur Barri Símonarson
Vésteinn Snæbjarnarson
Pétur Orri Ragnarsson
Haukur Páll Jónsson
Vilhjálmur Þorsteinsson
VLM
121
13
0
15 Sep 2021
123456
Next