Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2003.04887
Cited By
v1
v2 (latest)
ReZero is All You Need: Fast Convergence at Large Depth
Conference on Uncertainty in Artificial Intelligence (UAI), 2020
10 March 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ReZero is All You Need: Fast Convergence at Large Depth"
50 / 186 papers shown
DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams
Ginés Carreto Picón
Peng Yuan Zhou
Qi Zhang
Alexandros Iosifidis
AI4TS
CLL
225
0
0
21 Nov 2025
A Vector Symbolic Approach to Multiple Instance Learning
Ehsan Ahmed Dhrubo
Mohammad Mahmudul Alam
Edward Raff
Tim Oates
James Holt
162
0
0
20 Nov 2025
Random Initialization of Gated Sparse Adapters
Vi Retault
Yohaï-Eliel Berreby
CLL
MoE
252
0
0
03 Nov 2025
From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics
Zheng-an Chen
Tao Luo
AI4CE
171
2
0
08 Oct 2025
Arithmetic-Mean
μ
μ
μ
P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets
Haosong Zhang
Shenxi Wu
Yichi Zhang
Wei Lin
W. Lin
267
0
0
05 Oct 2025
Beyond Gaussian Initializations: Signal Preserving Weight Initialization for Odd-Sigmoid Activations
Hyunwoo Lee
Hayoung Choi
Hyunju Kim
136
0
0
27 Sep 2025
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Dingzirui Wang
Xuanliang Zhang
Keyan Xu
Qingfu Zhu
Wanxiang Che
Yang Deng
LRM
202
1
0
25 Sep 2025
Recurrent State Encoders for Efficient Neural Combinatorial Optimization
Tim Dernedde
Daniela Thyssens
Lars Schmidt-Thieme
192
0
0
05 Sep 2025
Auto-Compressing Networks
Vaggelis Dorovatas
Georgios Paraskevopoulos
Alexandros Potamianos
584
2
0
11 Jun 2025
Learning in Compact Spaces with Approximately Normalized Transformer
Jörg Franke
Urs Spiegelhalter
Marianna Nezhurina
J. Jitsev
Katharina Eggensperger
Michael Hefenbrock
330
1
0
28 May 2025
Taming Transformer Without Using Learning Rate Warmup
International Conference on Learning Representations (ICLR), 2025
Xianbiao Qi
Yelin He
Jiaquan Ye
Chun-Guang Li
Bojia Zi
Xili Dai
Qin Zou
Rong Xiao
220
6
0
28 May 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
750
9
0
29 Apr 2025
Versatile Framework for Song Generation with Prompt-based Control
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
692
8
0
27 Apr 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Computer Vision and Pattern Recognition (CVPR), 2025
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
VGen
699
17
0
21 Mar 2025
Transformers without Normalization
Computer Vision and Pattern Recognition (CVPR), 2025
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
OffRL
ViT
567
124
0
13 Mar 2025
A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization
Md Yousuf Harun
Christopher Kanan
AI4CE
369
2
0
09 Mar 2025
NoT: Federated Unlearning via Weight Negation
Computer Vision and Pattern Recognition (CVPR), 2025
Yasser H. Khalil
Leo Maxime Brunswic
Soufiane Lamghari
Xu Li
Mahdi Beitollahi
Xi Chen
MU
344
14
0
07 Mar 2025
Rethinking Light Decoder-based Solvers for Vehicle Routing Problems
International Conference on Learning Representations (ICLR), 2025
Ziwei Huang
Jianan Zhou
Zhiguang Cao
Yixin Xu
334
27
0
02 Mar 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
560
19
0
24 Feb 2025
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Yunzhe Hu
Difan Zou
Dong Xu
639
1
0
17 Feb 2025
Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks
Lars C.P.M. Quaedvlieg
284
0
0
31 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
AAAI Conference on Artificial Intelligence (AAAI), 2024
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
431
3
0
28 Jan 2025
Parseval Regularization for Continual Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2024
Wesley Chung
Lynn Cherif
David Meger
Doina Precup
CLL
329
18
0
10 Dec 2024
GraphXForm: Graph transformer for computer-aided molecular design
Digital Discovery (DD), 2024
Jonathan Pirnay
Jan G. Rittig
Alexander B. Wolf
Martin Grohe
Jakob Burger
Alexander Mitsos
D. G. Grimm
AI4CE
526
7
0
03 Nov 2024
Scale Propagation Network for Generalizable Depth Completion
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Haotian Wang
Meng Yang
Xinhu Zheng
Gang Hua
294
5
0
24 Oct 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
International Conference on Learning Representations (ICLR), 2024
Federico Arangath Joseph
Jerome Sieber
Melanie Zeilinger
Carmen Amo Alonso
528
3
0
14 Oct 2024
Neural Solver Selection for Combinatorial Optimization
Chengrui Gao
Haopu Shang
Ke Xue
Chao Qian
387
4
0
13 Oct 2024
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
International Conference on Learning Representations (ICLR), 2024
Hyunwoo Lee
Hayoung Choi
Hyunju Kim
264
6
0
03 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
894
10
0
02 Oct 2024
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects
IEEE Access (IEEE Access), 2024
Awal Ahmed Fime
Saifuddin Mahmud
Arpita Das
Md. Sunzidul Islam
Hong-Hoon Kim
VGen
3DV
367
4
0
14 Sep 2024
Efficient Training of Large Vision Models via Advanced Automated Progressive Learning
Changlin Li
Jiawei Zhang
Sihao Lin
Zongxin Yang
Junwei Liang
Xiaodan Liang
Xiaojun Chang
VLM
302
2
0
06 Sep 2024
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
444
1
0
10 Aug 2024
Block-Operations: Using Modular Routing to Improve Compositional Generalization
Florian Dietz
Dietrich Klakow
AI4CE
237
0
0
01 Aug 2024
Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization
Jonathan Pirnay
D. G. Grimm
BDL
356
5
0
24 Jul 2024
M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution
Agust Egilsson
201
0
0
03 Jul 2024
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
Lu Yin
Q. Xiao
Stavros Petridis
Shiwei Liu
Maja Pantic
247
2
0
25 Jun 2024
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulic
Sofia Michel
J. Andreoli
586
32
0
21 Jun 2024
Neural Residual Diffusion Models for Deep Scalable Vision Generation
Neural Information Processing Systems (NeurIPS), 2024
Zhiyuan Ma
Liangliang Zhao
Biqing Qi
Bowen Zhou
DiffM
466
9
0
19 Jun 2024
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans
Ludvig Ericson
Patric Jensfelt
386
14
0
13 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
371
11
0
29 May 2024
Deep Learning Calabi-Yau four folds with hybrid and recurrent neural network architectures
H. L. Dao
345
1
0
27 May 2024
High-Performance Temporal Reversible Spiking Neural Networks with
O
(
L
)
O(L)
O
(
L
)
Training Memory and
O
(
1
)
O(1)
O
(
1
)
Inference Cost
Jiakui Hu
Man Yao
Xuerui Qiu
Yuhong Chou
Yuxuan Cai
Ning Qiao
Yonghong Tian
Boxing Xu
Guoqi Li
AI4CE
286
23
0
26 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Shiyang Feng
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Jiaming Song
VGen
416
134
0
09 May 2024
HILCodec: High Fidelity and Lightweight Neural Audio Codec
S. Ahn
Beom Jun Woo
Mingrui Han
Chanyeong Moon
Nam Soo Kim
378
18
0
08 May 2024
HMAR: Hierarchical Masked Attention for Multi-Behaviour Recommendation
Shereen Elsayed
Ahmed Rashed
Lars Schmidt-Thieme
363
6
0
29 Apr 2024
Training-Free Unsupervised Prompt for Vision-Language Models
Sifan Long
Linbin Wang
Zhen Zhao
Zichang Tan
Yiming Wu
Shengsheng Wang
Jingdong Wang
VLM
VPVLM
394
4
0
25 Apr 2024
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
K. Chumachenko
Alexandros Iosifidis
Moncef Gabbouj
198
26
0
13 Apr 2024
Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement
Jonathan Pirnay
D. G. Grimm
455
32
0
22 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime
International Conference on Learning Representations (ICLR), 2024
Yihang Chen
Fanghui Liu
Yiping Lu
Grigorios G. Chrysos
Volkan Cevher
292
2
0
14 Mar 2024
ConvTimeNet: A Deep Hierarchical Fully Convolutional Model for Multivariate Time Series Analysis
Mingyue Cheng
Jiqian Yang
Tingyue Pan
Qi Liu
Zhi Li
AI4TS
271
45
0
03 Mar 2024
1
2
3
4
Next
Page 1 of 4