v1v2 (latest)

ReZero is All You Need: Fast Convergence at Large Depth

Conference on Uncertainty in Artificial Intelligence (UAI), 2020

10 March 2020

Thomas C. Bachlechner

Bodhisattwa Prasad Majumder

Papers citing "ReZero is All You Need: Fast Convergence at Large Depth"

50 / 186 papers shown

DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams

225

21 Nov 2025

A Vector Symbolic Approach to Multiple Instance Learning

Ehsan Ahmed Dhrubo

Mohammad Mahmudul Alam

Edward Raff

Tim Oates

James Holt

162

20 Nov 2025

Random Initialization of Gated Sparse Adapters

Vi Retault

Yohaï-Eliel Berreby

CLL MoE

252

03 Nov 2025

From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics

Zheng-an Chen

Tao Luo

AI4CE

171

08 Oct 2025

Arithmetic-Mean

μ

P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets

267

05 Oct 2025

Beyond Gaussian Initializations: Signal Preserving Weight Initialization for Odd-Sigmoid Activations

Hyunwoo Lee

Hayoung Choi

Hyunju Kim

136

27 Sep 2025

Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond

202

25 Sep 2025

Recurrent State Encoders for Efficient Neural Combinatorial Optimization

Tim Dernedde

Daniela Thyssens

Lars Schmidt-Thieme

192

05 Sep 2025

Auto-Compressing Networks

Vaggelis Dorovatas

Georgios Paraskevopoulos

Alexandros Potamianos

584

11 Jun 2025

Learning in Compact Spaces with Approximately Normalized Transformer

Katharina Eggensperger

Michael Hefenbrock

330

28 May 2025

Taming Transformer Without Using Learning Rate WarmupInternational Conference on Learning Representations (ICLR), 2025

220

28 May 2025

ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting

750

29 Apr 2025

Versatile Framework for Song Generation with Prompt-based Control

...

692

27 Apr 2025

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image RetrievalComputer Vision and Pattern Recognition (CVPR), 2025

699

21 Mar 2025

Transformers without NormalizationComputer Vision and Pattern Recognition (CVPR), 2025

567

124

13 Mar 2025

A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization

Md Yousuf Harun

Christopher Kanan

AI4CE

369

09 Mar 2025

NoT: Federated Unlearning via Weight NegationComputer Vision and Pattern Recognition (CVPR), 2025

344

07 Mar 2025

Rethinking Light Decoder-based Solvers for Vehicle Routing ProblemsInternational Conference on Learning Representations (ICLR), 2025

334

02 Mar 2025

MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

Benedikt Alkin

Lukas Miklautz

Sepp Hochreiter

Johannes Brandstetter

VLM

560

24 Feb 2025

Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization

Yunzhe Hu

Difan Zou

Dong Xu

639

17 Feb 2025

Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks

Lars C.P.M. Quaedvlieg

284

31 Jan 2025

Merino: Entropy-driven Design for Generative Language Models on IoT DevicesAAAI Conference on Artificial Intelligence (AAAI), 2024

431

28 Jan 2025

Parseval Regularization for Continual Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024

329

10 Dec 2024

GraphXForm: Graph transformer for computer-aided molecular designDigital Discovery (DD), 2024

526

03 Nov 2024

Scale Propagation Network for Generalizable Depth CompletionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

Haotian Wang

Meng Yang

Xinhu Zheng

Gang Hua

294

24 Oct 2024

Lambda-Skip Connections: the architectural component that prevents Rank CollapseInternational Conference on Learning Representations (ICLR), 2024

Federico Arangath Joseph

Jerome Sieber

Melanie Zeilinger

Carmen Amo Alonso

528

14 Oct 2024

Neural Solver Selection for Combinatorial Optimization

387

13 Oct 2024

Robust Weight Initialization for Tanh Neural Networks with Fixed Point AnalysisInternational Conference on Learning Representations (ICLR), 2024

Hyunwoo Lee

Hayoung Choi

Hyunju Kim

264

03 Oct 2024

On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

Kevin Xu

Issei Sato

894

02 Oct 2024

Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future ProspectsIEEE Access (IEEE Access), 2024

367

14 Sep 2024

Efficient Training of Large Vision Models via Advanced Automated Progressive Learning

Changlin Li

302

06 Sep 2024

SAMSA: Efficient Transformer for Many Data Modalities

444

10 Aug 2024

Block-Operations: Using Modular Routing to Improve Compositional Generalization

Florian Dietz

Dietrich Klakow

AI4CE

237

01 Aug 2024

Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization

Jonathan Pirnay

D. G. Grimm

BDL

356

24 Jul 2024

M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution

Agust Egilsson

201

03 Jul 2024

MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization

Adriana Fernandez-Lopez

Honglie Chen

Pingchuan Ma

Lu Yin

Q. Xiao

Stavros Petridis

Shiwei Liu

Maja Pantic

247

25 Jun 2024

GOAL: A Generalist Combinatorial Optimization Agent Learner

Darko Drakulic

Sofia Michel

J. Andreoli

586

21 Jun 2024

Neural Residual Diffusion Models for Deep Scalable Vision GenerationNeural Information Processing Systems (NeurIPS), 2024

Bowen Zhou

466

19 Jun 2024

Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans

Ludvig Ericson

Patric Jensfelt

386

13 Jun 2024

Understanding and Minimising Outlier Features in Neural Network Training

371

29 May 2024

Deep Learning Calabi-Yau four folds with hybrid and recurrent neural network architectures

H. L. Dao

345

27 May 2024

High-Performance Temporal Reversible Spiking Neural Networks with

O(L)

Training Memory and

O(1)

Yonghong Tian

286

26 May 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

...

Weicai Ye

Yu Qiao

416

134

09 May 2024

HILCodec: High Fidelity and Lightweight Neural Audio Codec

378

08 May 2024

HMAR: Hierarchical Masked Attention for Multi-Behaviour Recommendation

Shereen Elsayed

Ahmed Rashed

Lars Schmidt-Thieme

363

29 Apr 2024

Training-Free Unsupervised Prompt for Vision-Language Models

Zichang Tan

394

25 Apr 2024

MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild

K. Chumachenko

Alexandros Iosifidis

Moncef Gabbouj

198

13 Apr 2024

Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement

Jonathan Pirnay

D. G. Grimm

455

22 Mar 2024

Generalization of Scaled Deep ResNets in the Mean-Field RegimeInternational Conference on Learning Representations (ICLR), 2024

292

14 Mar 2024

ConvTimeNet: A Deep Hierarchical Fully Convolutional Model for Multivariate Time Series Analysis

Mingyue Cheng

Qi Liu

271

03 Mar 2024