ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.04887
  4. Cited By
ReZero is All You Need: Fast Convergence at Large Depth

ReZero is All You Need: Fast Convergence at Large Depth

10 March 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
    AI4CE
ArXivPDFHTML

Papers citing "ReZero is All You Need: Fast Convergence at Large Depth"

50 / 56 papers shown
Title
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Y. Zhang
Wenxiang Guo
Changhao Pan
Z. Zhu
Tao Jin
Zhou Zhao
VGen
47
0
0
29 Apr 2025
Versatile Framework for Song Generation with Prompt-based Control
Versatile Framework for Song Generation with Prompt-based Control
Y. Zhang
Wenxiang Guo
Changhao Pan
Z. Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
74
1
0
27 Apr 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
VGen
42
1
0
21 Mar 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
63
8
0
24 Feb 2025
Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks
Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks
Lars C.P.M. Quaedvlieg
56
0
0
31 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
75
0
0
28 Jan 2025
GraphXForm: Graph transformer for computer-aided molecular design
GraphXForm: Graph transformer for computer-aided molecular design
Jonathan Pirnay
Jan G. Rittig
Alexander B. Wolf
Martin Grohe
Jakob Burger
Alexander Mitsos
D. G. Grimm
AI4CE
49
1
0
03 Nov 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph
Jerome Sieber
M. Zeilinger
Carmen Amo Alonso
33
0
0
14 Oct 2024
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
Hyunwoo Lee
Hayoung Choi
Hyunju Kim
18
1
0
03 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
37
3
0
02 Oct 2024
GOAL: A Generalist Combinatorial Optimization Agent Learner
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulic
Sofia Michel
J. Andreoli
31
6
0
21 Jun 2024
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by
  Learning from Floor Plans
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans
Ludvig Ericson
Patric Jensfelt
34
7
0
13 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
34
3
0
29 May 2024
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Jing Chen
Xingcheng Song
Zhendong Peng
Binbin Zhang
Fuping Pan
Zhiyong Wu
DiffM
16
16
0
31 Aug 2023
Multiplicative update rules for accelerating deep learning training and
  increasing robustness
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
26
2
0
14 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Taorong Liu
Liang Liao
Delin Chen
Jing Xiao
Zheng Wang
Chia-Wen Lin
Shiníchi Satoh
ViT
DiffM
21
6
0
20 Jun 2023
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
  Property Prediction
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction
Christopher Fifty
Joseph M. Paggi
Ehsan Amid
J. Leskovec
R. Dror
AI4CE
11
0
0
04 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
18
47
0
02 Feb 2023
Expected Gradients of Maxout Networks and Consequences to Parameter
  Initialization
Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
Hanna Tseran
Guido Montúfar
ODL
8
0
0
17 Jan 2023
Asymptotic Analysis of Deep Residual Networks
Asymptotic Analysis of Deep Residual Networks
R. Cont
Alain Rossier
Renyuan Xu
19
4
0
15 Dec 2022
ZITS++: Image Inpainting by Improving the Incremental Transformer on
  Structural Priors
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors
Chenjie Cao
Qiaole Dong
Yanwei Fu
33
30
0
12 Oct 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
24
2
0
05 Oct 2022
Learning an Efficient Multimodal Depth Completion Model
Learning an Efficient Multimodal Depth Completion Model
Dewang Hou
Yuanyuan Du
Kai Zhao
Yang Zhao
17
5
0
23 Aug 2022
Learning Prior Feature and Attention Enhanced Image Inpainting
Learning Prior Feature and Attention Enhanced Image Inpainting
Chenjie Cao
Qiaole Dong
Yanwei Fu
DiffM
23
24
0
03 Aug 2022
Removing Batch Normalization Boosts Adversarial Training
Removing Batch Normalization Boosts Adversarial Training
Haotao Wang
Aston Zhang
Shuai Zheng
Xingjian Shi
Mu Li
Zhangyang Wang
24
41
0
04 Jul 2022
Scaling ResNets in the Large-depth Regime
Scaling ResNets in the Large-depth Regime
P. Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
17
16
0
14 Jun 2022
Learning What and Where: Disentangling Location and Identity Tracking
  Without Supervision
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Manuel Traub
S. Otte
Tobias Menge
Matthias Karlbauer
Jannik Thummel
Martin Volker Butz
21
20
0
26 May 2022
Hypercomplex Image-to-Image Translation
Hypercomplex Image-to-Image Translation
Eleonora Grassucci
Luigi Sigillo
A. Uncini
Danilo Comminiello
25
7
0
04 May 2022
Automated Progressive Learning for Efficient Training of Vision
  Transformers
Automated Progressive Learning for Efficient Training of Vision Transformers
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
16
46
0
28 Mar 2022
Image Super-Resolution With Deep Variational Autoencoders
Image Super-Resolution With Deep Variational Autoencoders
Darius Chira
Ilian Haralampiev
Ole Winther
Andrea Dittadi
Valentin Liévin
DRL
24
32
0
17 Mar 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored
  Rectifiers
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
13
26
0
15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
24
148
0
07 Mar 2022
FloorGenT: Generative Vector Graphic Model of Floor Plans for Robotics
FloorGenT: Generative Vector Graphic Model of Floor Plans for Robotics
Ludvig Ericson
Patric Jensfelt
3DV
14
2
0
07 Mar 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking
  Positional Encoding
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Qiaole Dong
Chenjie Cao
Yanwei Fu
CLL
11
137
0
02 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
155
0
01 Mar 2022
Hierarchical Graph-Convolutional Variational AutoEncoding for Generative
  Modelling of Human Motion
Hierarchical Graph-Convolutional Variational AutoEncoding for Generative Modelling of Human Motion
Anthony Bourached
Robert J. Gray
Xiaodong Guan
Ryan-Rhys Griffiths
A. Jha
P. Nachev
3DH
DRL
14
1
0
24 Nov 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
26
74
0
18 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
11
4
0
18 Sep 2021
MedGPT: Medical Concept Prediction from Clinical Narratives
MedGPT: Medical Concept Prediction from Clinical Narratives
Z. Kraljevic
Anthony Shek
D. Bean
R. Bendayan
J. Teo
Richard J. B. Dobson
LM&MA
AI4TS
MedIm
16
38
0
07 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
16
21
0
02 Jul 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
27
1,084
0
08 Jun 2021
Scaling Properties of Deep Residual Networks
Scaling Properties of Deep Residual Networks
A. Cohen
R. Cont
Alain Rossier
Renyuan Xu
6
18
0
25 May 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu (Allen) Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
31
36
0
16 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
23
986
0
31 Mar 2021
Predicting the Behavior of Dealers in Over-The-Counter Corporate Bond
  Markets
Predicting the Behavior of Dealers in Over-The-Counter Corporate Bond Markets
Yusen Lin
Jinming Xue
L. Raschid
6
3
0
12 Mar 2021
3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
Xiangyu Xu
Hao Chen
Francesc Moreno-Noguer
László A. Jeni
Fernando De la Torre
3DH
14
35
0
11 Mar 2021
Generating Images with Sparse Representations
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
11
199
0
05 Mar 2021
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
W. R. Huang
Tom Goldstein
ODL
23
53
0
16 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
12
Next