ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00247
  4. Cited By
Training Tips for the Transformer Model

Training Tips for the Transformer Model

1 April 2018
Martin Popel
Ondrej Bojar
ArXivPDFHTML

Papers citing "Training Tips for the Transformer Model"

50 / 50 papers shown
Title
A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation
Shanhe You
Xuewen Luo
Xinhe Liang
Jiashu Yu
Chen Zheng
Jiangtao Gong
77
1
0
07 Mar 2025
A Unified Hyperparameter Optimization Pipeline for Transformer-Based Time Series Forecasting Models
Jingjing Xu
Caesar Wu
Yuan-Fang Li
Grégoire Danoy
Pascal Bouvry
TPM
AI4TS
54
0
0
03 Jan 2025
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
41
7
0
14 Oct 2024
Challenging Gradient Boosted Decision Trees with Tabular Transformers
  for Fraud Detection at Booking.com
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com
Sergei Krutikov
Bulat Khaertdinov
Rodion Kiriukhin
Shubham Agrawal
Kees Jan de Vries
LMTD
48
0
0
22 May 2024
Simple and Scalable Strategies to Continually Pre-train Large Language
  Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
44
54
0
13 Mar 2024
Boosting Transformer's Robustness and Efficacy in PPG Signal Artifact
  Detection with Self-Supervised Learning
Boosting Transformer's Robustness and Efficacy in PPG Signal Artifact Detection with Self-Supervised Learning
Thanh-Dung Le
34
1
0
02 Jan 2024
Scaling Studies for Efficient Parameter Search and Parallelism for Large
  Language Model Pre-training
Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training
Michael Benington
Leo Phan
Chris Pierre Paul
Evan Shoemaker
Priyanka Ranade
Torstein Collett
Grant Hodgson Perez
Christopher Krieger
17
1
0
09 Oct 2023
A Case Study on Context Encoding in Multi-Encoder based Document-Level
  Neural Machine Translation
A Case Study on Context Encoding in Multi-Encoder based Document-Level Neural Machine Translation
Ramakrishna Appicharla
Baban Gain
Santanu Pal
Asif Ekbal
35
1
0
11 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated
  Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
21
0
0
05 Aug 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to
  Adaptive and Non-adaptive Momentum Optimizers
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
38
4
0
02 Jul 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research
  in Hausa Language
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
26
5
0
28 May 2023
Spatiotemporal Transformer for Stock Movement Prediction
Spatiotemporal Transformer for Stock Movement Prediction
Daniel Boyle
Jugal Kalita
AI4TS
17
2
0
05 May 2023
eWaSR -- an embedded-compute-ready maritime obstacle detection network
eWaSR -- an embedded-compute-ready maritime obstacle detection network
Matija Tersek
Lojze Žust
Matej Kristan
27
9
0
21 Apr 2023
Training Strategies for Vision Transformers for Object Detection
Training Strategies for Vision Transformers for Object Detection
Apoorv Singh
31
4
0
05 Apr 2023
Improving Transformer Performance for French Clinical Notes
  Classification Using Mixture of Experts on a Limited Dataset
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
72
5
0
22 Mar 2023
Encoding Sentence Position in Context-Aware Neural Machine Translation
  with Concatenation
Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation
Lorenzo Lupo
Marco Dinarelli
Laurent Besacier
39
9
0
13 Feb 2023
Curriculum-Guided Abstractive Summarization
Curriculum-Guided Abstractive Summarization
Sajad Sotudeh
Hanieh Deilamsalehy
Franck Dernoncourt
Nazli Goharian
42
1
0
02 Feb 2023
Tackling Low-Resourced Sign Language Translation: UPC at WMT-SLT 22
Tackling Low-Resourced Sign Language Translation: UPC at WMT-SLT 22
Laia Tarrés
Gerard I. Gállego
Xavier Giró-i-Nieto
Jordi Torres
SLR
37
5
0
02 Dec 2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming
  Speech Recognition
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
29
2
0
31 Oct 2022
Focused Concatenation for Context-Aware Neural Machine Translation
Focused Concatenation for Context-Aware Neural Machine Translation
Lorenzo Lupo
Marco Dinarelli
Laurent Besacier
27
8
0
24 Oct 2022
Revisiting Checkpoint Averaging for Neural Machine Translation
Revisiting Checkpoint Averaging for Neural Machine Translation
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
MoMe
29
11
0
21 Oct 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine
  Translation
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
Idris Abdulmumin
S. Dash
Musa Abdullahi Dawud
Shantipriya Parida
Shamsuddeen Hassan Muhammad
I. Ahmad
Subhadarshi Panda
Ondrej Bojar
B. Galadanci
Bello Shehu Bello
21
17
0
02 May 2022
Gradient Descent, Stochastic Optimization, and Other Tales
Gradient Descent, Stochastic Optimization, and Other Tales
Jun Lu
22
8
0
02 May 2022
Distributionally Robust Models with Parametric Likelihood Ratios
Distributionally Robust Models with Parametric Likelihood Ratios
Paul Michel
Tatsunori Hashimoto
Graham Neubig
OOD
30
15
0
13 Apr 2022
Small Batch Sizes Improve Training of Low-Resource Neural MT
Small Batch Sizes Improve Training of Low-Resource Neural MT
Àlex R. Atrio
Andrei Popescu-Belis
35
6
0
20 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
28
149
0
07 Mar 2022
Speech Emotion Recognition using Self-Supervised Features
Speech Emotion Recognition using Self-Supervised Features
E. Morais
R. Hoory
Weizhong Zhu
Itai Gat
Matheus Damasceno
Hagai Aronowitz
SSL
MDE
20
113
0
07 Feb 2022
Persformer: A Transformer Architecture for Topological Machine Learning
Persformer: A Transformer Architecture for Topological Machine Learning
Raphael Reinauer
Matteo Caorsi
Nicolas Berkouk
32
15
0
30 Dec 2021
Why don't people use character-level machine translation?
Why don't people use character-level machine translation?
Jindrich Libovický
Helmut Schmid
Alexander Fraser
65
28
0
15 Oct 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
46
2,204
0
20 Apr 2021
Domain Adaptation and Multi-Domain Adaptation for Neural Machine
  Translation: A Survey
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
Danielle Saunders
AI4CE
27
86
0
14 Apr 2021
Towards Automated Psychotherapy via Language Modeling
Towards Automated Psychotherapy via Language Modeling
Houjun Liu
AI4MH
60
3
0
05 Apr 2021
Optimizing Deeper Transformers on Small Datasets
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
24
69
0
30 Dec 2020
Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
Chen Xu
Bojie Hu
Yufan Jiang
Kai Feng
Zeyang Wang
Shen Huang
Qi Ju
Tong Xiao
Jingbo Zhu
20
22
0
30 Nov 2020
Exploiting Neural Query Translation into Cross Lingual Information
  Retrieval
Exploiting Neural Query Translation into Cross Lingual Information Retrieval
Liang Yao
Baosong Yang
Haibo Zhang
Weihua Luo
Boxing Chen
22
12
0
26 Oct 2020
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient
  Direction Change
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
Hongfei Xu
Josef van Genabith
Deyi Xiong
Qiuhui Liu
16
10
0
05 May 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
35
949
0
12 Feb 2020
Exploring Benefits of Transfer Learning in Neural Machine Translation
Exploring Benefits of Transfer Learning in Neural Machine Translation
Tom Kocmi
29
17
0
06 Jan 2020
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
28
313
0
04 Dec 2019
A Bilingual Generative Transformer for Semantic Sentence Embedding
A Bilingual Generative Transformer for Semantic Sentence Embedding
John Wieting
Graham Neubig
Taylor Berg-Kirkpatrick
22
28
0
10 Nov 2019
Transformers without Tears: Improving the Normalization of
  Self-Attention
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
41
224
0
14 Oct 2019
Hotel2vec: Learning Attribute-Aware Hotel Embeddings with
  Self-Supervision
Hotel2vec: Learning Attribute-Aware Hotel Embeddings with Self-Supervision
A. Sadeghian
Shervin Minaee
Ioannis Partalas
Xinxin Li
D. Wang
Brooke Cowan
DML
SSL
3DV
27
8
0
30 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
37
716
0
13 Sep 2019
Predicting Actions to Help Predict Translations
Predicting Actions to Help Predict Translations
Zixiu "Alex" Wu
Julia Ive
Josiah Wang
Pranava Madhyastha
Lucia Specia
17
7
0
05 Aug 2019
Learning cross-lingual phonological and orthagraphic adaptations: a case
  study in improving neural machine translation between low-resource languages
Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
Saurav Jha
A. Sudhakar
Anil Kumar Singh
21
4
0
21 Nov 2018
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Tom Kocmi
Ondrej Bojar
22
171
0
02 Sep 2018
An Operation Sequence Model for Explainable Neural Machine Translation
An Operation Sequence Model for Explainable Neural Machine Translation
Felix Stahlberg
Danielle Saunders
Bill Byrne
LRM
MILM
40
29
0
29 Aug 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhehuai Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,750
0
26 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
310
2,896
0
15 Sep 2016
1