Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.04887
Cited By
ReZero is All You Need: Fast Convergence at Large Depth
10 March 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ReZero is All You Need: Fast Convergence at Large Depth"
50 / 56 papers shown
Title
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Y. Zhang
Wenxiang Guo
Changhao Pan
Z. Zhu
Tao Jin
Zhou Zhao
VGen
47
0
0
29 Apr 2025
Versatile Framework for Song Generation with Prompt-based Control
Y. Zhang
Wenxiang Guo
Changhao Pan
Z. Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
74
1
0
27 Apr 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
VGen
42
1
0
21 Mar 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
63
8
0
24 Feb 2025
Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks
Lars C.P.M. Quaedvlieg
56
0
0
31 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
75
0
0
28 Jan 2025
GraphXForm: Graph transformer for computer-aided molecular design
Jonathan Pirnay
Jan G. Rittig
Alexander B. Wolf
Martin Grohe
Jakob Burger
Alexander Mitsos
D. G. Grimm
AI4CE
49
1
0
03 Nov 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph
Jerome Sieber
M. Zeilinger
Carmen Amo Alonso
33
0
0
14 Oct 2024
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
Hyunwoo Lee
Hayoung Choi
Hyunju Kim
25
1
0
03 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
37
3
0
02 Oct 2024
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulic
Sofia Michel
J. Andreoli
31
6
0
21 Jun 2024
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans
Ludvig Ericson
Patric Jensfelt
34
7
0
13 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
34
3
0
29 May 2024
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Jing Chen
Xingcheng Song
Zhendong Peng
Binbin Zhang
Fuping Pan
Zhiyong Wu
DiffM
16
16
0
31 Aug 2023
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
26
2
0
14 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Taorong Liu
Liang Liao
Delin Chen
Jing Xiao
Zheng Wang
Chia-Wen Lin
Shiníchi Satoh
ViT
DiffM
21
6
0
20 Jun 2023
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction
Christopher Fifty
Joseph M. Paggi
Ehsan Amid
J. Leskovec
R. Dror
AI4CE
16
0
0
04 Feb 2023
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
18
47
0
02 Feb 2023
Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
Hanna Tseran
Guido Montúfar
ODL
8
0
0
17 Jan 2023
Asymptotic Analysis of Deep Residual Networks
R. Cont
Alain Rossier
Renyuan Xu
19
4
0
15 Dec 2022
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors
Chenjie Cao
Qiaole Dong
Yanwei Fu
33
30
0
12 Oct 2022
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
29
2
0
05 Oct 2022
Learning an Efficient Multimodal Depth Completion Model
Dewang Hou
Yuanyuan Du
Kai Zhao
Yang Zhao
17
5
0
23 Aug 2022
Learning Prior Feature and Attention Enhanced Image Inpainting
Chenjie Cao
Qiaole Dong
Yanwei Fu
DiffM
23
24
0
03 Aug 2022
Removing Batch Normalization Boosts Adversarial Training
Haotao Wang
Aston Zhang
Shuai Zheng
Xingjian Shi
Mu Li
Zhangyang Wang
24
41
0
04 Jul 2022
Scaling ResNets in the Large-depth Regime
P. Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
17
16
0
14 Jun 2022
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Manuel Traub
S. Otte
Tobias Menge
Matthias Karlbauer
Jannik Thummel
Martin Volker Butz
21
20
0
26 May 2022
Hypercomplex Image-to-Image Translation
Eleonora Grassucci
Luigi Sigillo
A. Uncini
Danilo Comminiello
25
7
0
04 May 2022
Automated Progressive Learning for Efficient Training of Vision Transformers
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
16
46
0
28 Mar 2022
Image Super-Resolution With Deep Variational Autoencoders
Darius Chira
Ilian Haralampiev
Ole Winther
Andrea Dittadi
Valentin Liévin
DRL
24
32
0
17 Mar 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
13
26
0
15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
24
148
0
07 Mar 2022
FloorGenT: Generative Vector Graphic Model of Floor Plans for Robotics
Ludvig Ericson
Patric Jensfelt
3DV
14
2
0
07 Mar 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Qiaole Dong
Chenjie Cao
Yanwei Fu
CLL
14
137
0
02 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
155
0
01 Mar 2022
Hierarchical Graph-Convolutional Variational AutoEncoding for Generative Modelling of Human Motion
Anthony Bourached
Robert J. Gray
Xiaodong Guan
Ryan-Rhys Griffiths
A. Jha
P. Nachev
3DH
DRL
14
1
0
24 Nov 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
28
74
0
18 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
11
4
0
18 Sep 2021
MedGPT: Medical Concept Prediction from Clinical Narratives
Z. Kraljevic
Anthony Shek
D. Bean
R. Bendayan
J. Teo
Richard J. B. Dobson
LM&MA
AI4TS
MedIm
16
38
0
07 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
16
21
0
02 Jul 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
27
1,084
0
08 Jun 2021
Scaling Properties of Deep Residual Networks
A. Cohen
R. Cont
Alain Rossier
Renyuan Xu
6
18
0
25 May 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu (Allen) Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
31
36
0
16 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
23
986
0
31 Mar 2021
Predicting the Behavior of Dealers in Over-The-Counter Corporate Bond Markets
Yusen Lin
Jinming Xue
L. Raschid
8
3
0
12 Mar 2021
3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
Xiangyu Xu
Hao Chen
Francesc Moreno-Noguer
László A. Jeni
Fernando De la Torre
3DH
14
35
0
11 Mar 2021
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
13
199
0
05 Mar 2021
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
W. R. Huang
Tom Goldstein
ODL
25
53
0
16 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
1
2
Next