Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.00247
Cited By
Training Tips for the Transformer Model
1 April 2018
Martin Popel
Ondrej Bojar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training Tips for the Transformer Model"
50 / 50 papers shown
Title
A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation
Shanhe You
Xuewen Luo
Xinhe Liang
Jiashu Yu
Chen Zheng
Jiangtao Gong
77
1
0
07 Mar 2025
A Unified Hyperparameter Optimization Pipeline for Transformer-Based Time Series Forecasting Models
Jingjing Xu
Caesar Wu
Yuan-Fang Li
Grégoire Danoy
Pascal Bouvry
TPM
AI4TS
54
0
0
03 Jan 2025
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
41
7
0
14 Oct 2024
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com
Sergei Krutikov
Bulat Khaertdinov
Rodion Kiriukhin
Shubham Agrawal
Kees Jan de Vries
LMTD
48
0
0
22 May 2024
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
44
54
0
13 Mar 2024
Boosting Transformer's Robustness and Efficacy in PPG Signal Artifact Detection with Self-Supervised Learning
Thanh-Dung Le
34
1
0
02 Jan 2024
Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training
Michael Benington
Leo Phan
Chris Pierre Paul
Evan Shoemaker
Priyanka Ranade
Torstein Collett
Grant Hodgson Perez
Christopher Krieger
17
1
0
09 Oct 2023
A Case Study on Context Encoding in Multi-Encoder based Document-Level Neural Machine Translation
Ramakrishna Appicharla
Baban Gain
Santanu Pal
Asif Ekbal
35
1
0
11 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
21
0
0
05 Aug 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
33
4
0
02 Jul 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
26
5
0
28 May 2023
Spatiotemporal Transformer for Stock Movement Prediction
Daniel Boyle
Jugal Kalita
AI4TS
13
2
0
05 May 2023
eWaSR -- an embedded-compute-ready maritime obstacle detection network
Matija Tersek
Lojze Žust
Matej Kristan
27
9
0
21 Apr 2023
Training Strategies for Vision Transformers for Object Detection
Apoorv Singh
31
4
0
05 Apr 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
72
5
0
22 Mar 2023
Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation
Lorenzo Lupo
Marco Dinarelli
Laurent Besacier
39
9
0
13 Feb 2023
Curriculum-Guided Abstractive Summarization
Sajad Sotudeh
Hanieh Deilamsalehy
Franck Dernoncourt
Nazli Goharian
35
1
0
02 Feb 2023
Tackling Low-Resourced Sign Language Translation: UPC at WMT-SLT 22
Laia Tarrés
Gerard I. Gállego
Xavier Giró-i-Nieto
Jordi Torres
SLR
37
5
0
02 Dec 2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
29
2
0
31 Oct 2022
Focused Concatenation for Context-Aware Neural Machine Translation
Lorenzo Lupo
Marco Dinarelli
Laurent Besacier
27
8
0
24 Oct 2022
Revisiting Checkpoint Averaging for Neural Machine Translation
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
MoMe
29
11
0
21 Oct 2022
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
Idris Abdulmumin
S. Dash
Musa Abdullahi Dawud
Shantipriya Parida
Shamsuddeen Hassan Muhammad
I. Ahmad
Subhadarshi Panda
Ondrej Bojar
B. Galadanci
Bello Shehu Bello
21
17
0
02 May 2022
Gradient Descent, Stochastic Optimization, and Other Tales
Jun Lu
19
8
0
02 May 2022
Distributionally Robust Models with Parametric Likelihood Ratios
Paul Michel
Tatsunori Hashimoto
Graham Neubig
OOD
30
15
0
13 Apr 2022
Small Batch Sizes Improve Training of Low-Resource Neural MT
Àlex R. Atrio
Andrei Popescu-Belis
35
6
0
20 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
149
0
07 Mar 2022
Speech Emotion Recognition using Self-Supervised Features
E. Morais
R. Hoory
Weizhong Zhu
Itai Gat
Matheus Damasceno
Hagai Aronowitz
SSL
MDE
20
113
0
07 Feb 2022
Persformer: A Transformer Architecture for Topological Machine Learning
Raphael Reinauer
Matteo Caorsi
Nicolas Berkouk
32
15
0
30 Dec 2021
Why don't people use character-level machine translation?
Jindrich Libovický
Helmut Schmid
Alexander Fraser
65
28
0
15 Oct 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
46
2,190
0
20 Apr 2021
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
Danielle Saunders
AI4CE
27
86
0
14 Apr 2021
Towards Automated Psychotherapy via Language Modeling
Houjun Liu
AI4MH
58
3
0
05 Apr 2021
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
24
69
0
30 Dec 2020
Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
Chen Xu
Bojie Hu
Yufan Jiang
Kai Feng
Zeyang Wang
Shen Huang
Qi Ju
Tong Xiao
Jingbo Zhu
15
22
0
30 Nov 2020
Exploiting Neural Query Translation into Cross Lingual Information Retrieval
Liang Yao
Baosong Yang
Haibo Zhang
Weihua Luo
Boxing Chen
22
12
0
26 Oct 2020
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
Hongfei Xu
Josef van Genabith
Deyi Xiong
Qiuhui Liu
14
10
0
05 May 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
29
949
0
12 Feb 2020
Exploring Benefits of Transfer Learning in Neural Machine Translation
Tom Kocmi
29
17
0
06 Jan 2020
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
25
312
0
04 Dec 2019
A Bilingual Generative Transformer for Semantic Sentence Embedding
John Wieting
Graham Neubig
Taylor Berg-Kirkpatrick
22
28
0
10 Nov 2019
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
38
224
0
14 Oct 2019
Hotel2vec: Learning Attribute-Aware Hotel Embeddings with Self-Supervision
A. Sadeghian
Shervin Minaee
Ioannis Partalas
Xinxin Li
D. Wang
Brooke Cowan
DML
SSL
3DV
27
8
0
30 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
37
716
0
13 Sep 2019
Predicting Actions to Help Predict Translations
Zixiu "Alex" Wu
Julia Ive
Josiah Wang
Pranava Madhyastha
Lucia Specia
17
7
0
05 Aug 2019
Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
Saurav Jha
A. Sudhakar
Anil Kumar Singh
21
4
0
21 Nov 2018
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Tom Kocmi
Ondrej Bojar
22
171
0
02 Sep 2018
An Operation Sequence Model for Explainable Neural Machine Translation
Felix Stahlberg
Danielle Saunders
Bill Byrne
LRM
MILM
40
29
0
29 Aug 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,748
0
26 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,892
0
15 Sep 2016
1