v1v2 (latest)

Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning

International Conference on Learning Representations (ICLR), 2025

3 March 2025

Papers citing "Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning"

45 / 45 papers shown

ODE-ViT: Plug & Play Attention Layer from the Generalization of the ViT as an Ordinary Differential Equation

157

20 Nov 2025

PCARNN-DCBF: Minimal-Intervention Geofence Enforcement for Ground Vehicles

Yinan Yu

Samuel Scheidegger

AI4CE

205

19 Nov 2025

IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method

184

26 Sep 2025

Interpretability as Alignment: Making Internal Understanding a Design Principle

Aadit Sengupta

Pratinav Seth

Vinay Kumar Sankarapu

AI4CE AAML

142

10 Sep 2025

TANDEM: Temporal Attention-guided Neural Differential Equations for Missingness in Time Series Classification

136

24 Aug 2025

Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians

Akiyoshi Tomihari

Ryo Karakida

360

26 May 2025

Efficient, Accurate and Stable Gradients for Neural ODEs

Sam McCallum

James Foster

449

15 Oct 2024

Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians

Rainer Engelken

233

28 Dec 2023

SigFormer: Signature Transformers for Deep Hedging

Thanh Nguyen-Tang

259

20 Oct 2023

Uncovering mesa-optimization algorithms in Transformers

...

Blaise Agüera y Arcas

Max Vladymyrov

Razvan Pascanu

João Sacramento

237

11 Sep 2023

Model evaluation for extreme risks

...

289

195

24 May 2023

Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingInternational Conference on Machine Learning (ICML), 2023

...

391

1,629

03 Apr 2023

LLaMA: Open and Efficient Foundation Language Models

...

6.8K

17,868

27 Feb 2023

Scaling Vision Transformers to 22 Billion ParametersInternational Conference on Machine Learning (ICML), 2023

...

407

774

10 Feb 2023

Scalable Diffusion Models with TransformersIEEE International Conference on Computer Vision (ICCV), 2022

William S. Peebles

Saining Xie

GNN

2.3K

4,295

19 Dec 2022

Transformers learn in-context by gradient descentInternational Conference on Machine Learning (ICML), 2022

493

644

15 Dec 2022

A Neural ODE Interpretation of Transformer Layers

Yaofeng Desmond Zhong

Tongtao Zhang

Amit Chakraborty

Biswadip Dey

313

12 Dec 2022

What Can Transformers Learn In-Context? A Case Study of Simple Function ClassesNeural Information Processing Systems (NeurIPS), 2022

657

674

01 Aug 2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingInternational Conference on Machine Learning (ICML), 2022

271

193

06 Jul 2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022

845

3,353

27 May 2022

XAI for Transformers: Better Explanations through Conservative PropagationInternational Conference on Machine Learning (ICML), 2022

332

127

15 Feb 2022

Equinox: neural networks in JAX via callable PyTrees and filtered transformations

Patrick Kidger

Cristian Garcia

275

191

30 Oct 2021

Sinkformers: Transformers with Doubly Stochastic AttentionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021

Michael E. Sander

Pierre Ablin

Mathieu Blondel

Gabriel Peyré

254

115

22 Oct 2021

Chaos as an interpretable benchmark for forecasting and data-driven modelling

W. Gilpin

AI4TS

293

106

11 Oct 2021

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

312

30 Sep 2021

ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

Jingbo Zhu

192

06 Apr 2021

Transformer Interpretability Beyond Attention VisualizationComputer Vision and Pattern Recognition (CVPR), 2020

Hila Chefer

Shir Gur

Lior Wolf

421

864

17 Dec 2020

Score-Based Generative Modeling through Stochastic Differential EquationsInternational Conference on Learning Representations (ICLR), 2020

Yang Song

Jascha Narain Sohl-Dickstein

2.2K

8,890

26 Nov 2020

Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated GradientsMathematical and Scientific Machine Learning (MSML), 2020

Yifei Huang

187

28 Sep 2020

On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems ToolsFrontiers in Applied Mathematics and Statistics (FAMS), 2020

213

25 Jun 2020

An Ode to an ODE

K. Choromanski

Jared Davis

Valerii Likhosherstov

Xingyou Song

Jean-Jacques E. Slotine

255

19 Jun 2020

Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020

...

2.0K

52,836

28 May 2020

Scaling Laws for Neural Language Models

1.8K

6,691

23 Jan 2020

ALBERT: A Lite BERT for Self-supervised Learning of Language RepresentationsInternational Conference on Learning Representations (ICLR), 2019

1.2K

7,141

26 Sep 2019

Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2019

Shaojie Bai

J. Zico Kolter

V. Koltun

224

773

03 Sep 2019

Attention is not not ExplanationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Sarah Wiegreffe

Yuval Pinter

XAI AAML FAtt

473

1,028

13 Aug 2019

ANODEV2: A Coupled Neural ODE Evolution Framework

Tianjun Zhang

176

10 Jun 2019

Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit

Belinda Tzen

Maxim Raginsky

DiffM

426

239

23 May 2019

Attention is not ExplanationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2019

Sarthak Jain

Byron C. Wallace

FAtt

1.1K

1,534

26 Feb 2019

Neural Ordinary Differential Equations

1.2K

6,219

19 Jun 2018

Measuring the Intrinsic Dimension of Objective LandscapesInternational Conference on Learning Representations (ICLR), 2018

305

480

24 Apr 2018

Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017

4.2K

162,388

12 Jun 2017

Pointer Sentinel Mixture Models

1.1K

3,505

26 Sep 2016

Memory-Efficient Backpropagation Through Time

Ivo Danihelka

200

259

10 Jun 2016

Training Deep Nets with Sublinear Memory Cost

494

1,352

21 Apr 2016