Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2503.01329
Cited By
v1
v2 (latest)
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
International Conference on Learning Representations (ICLR), 2025
3 March 2025
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning"
45 / 45 papers shown
ODE-ViT: Plug & Play Attention Layer from the Generalization of the ViT as an Ordinary Differential Equation
Carlos Boned Riera
David Romero Sanchez
Oriol Ramos Terrades
VLM
157
0
0
20 Nov 2025
PCARNN-DCBF: Minimal-Intervention Geofence Enforcement for Ground Vehicles
Yinan Yu
Samuel Scheidegger
AI4CE
205
0
0
19 Nov 2025
IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method
Xinyu Liu
Bei Li
Jiahao Liu
Junhao Ruan
Kechen Jiao
Hongyin Tang
Jingang Wang
Xiao Tong
Jingbo Zhu
184
0
0
26 Sep 2025
Interpretability as Alignment: Making Internal Understanding a Design Principle
Aadit Sengupta
Pratinav Seth
Vinay Kumar Sankarapu
AI4CE
AAML
142
0
0
10 Sep 2025
TANDEM: Temporal Attention-guided Neural Differential Equations for Missingness in Time Series Classification
YongKyung Oh
Dong-Young Lim
Sungil Kim
Alex Bui
136
0
0
24 Aug 2025
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
Akiyoshi Tomihari
Ryo Karakida
360
2
0
26 May 2025
Efficient, Accurate and Stable Gradients for Neural ODEs
Sam McCallum
James Foster
449
8
0
15 Oct 2024
Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians
Rainer Engelken
233
10
0
28 Dec 2023
SigFormer: Signature Transformers for Deep Hedging
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Toan M. Tran
Jaesik Choi
AIFin
259
10
0
20 Oct 2023
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
237
83
0
11 Sep 2023
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
...
Vijay Bolina
Jack Clark
Yoshua Bengio
Paul Christiano
Allan Dafoe
ELM
289
195
0
24 May 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
International Conference on Machine Learning (ICML), 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
391
1,629
0
03 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
6.8K
17,868
0
27 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
International Conference on Machine Learning (ICML), 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
407
774
0
10 Feb 2023
Scalable Diffusion Models with Transformers
IEEE International Conference on Computer Vision (ICCV), 2022
William S. Peebles
Saining Xie
GNN
2.3K
4,295
0
19 Dec 2022
Transformers learn in-context by gradient descent
International Conference on Machine Learning (ICML), 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
493
644
0
15 Dec 2022
A Neural ODE Interpretation of Transformer Layers
Yaofeng Desmond Zhong
Tongtao Zhang
Amit Chakraborty
Biswadip Dey
313
13
0
12 Dec 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Neural Information Processing Systems (NeurIPS), 2022
Shivam Garg
Dimitris Tsipras
Abigail Z. Jacobs
Gregory Valiant
657
674
0
01 Aug 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
International Conference on Machine Learning (ICML), 2022
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
271
193
0
06 Jul 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Neural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
845
3,353
0
27 May 2022
XAI for Transformers: Better Explanations through Conservative Propagation
International Conference on Machine Learning (ICML), 2022
Ameen Ali
Thomas Schnake
Oliver Eberle
G. Montavon
Klaus-Robert Muller
Lior Wolf
FAtt
332
127
0
15 Feb 2022
Equinox: neural networks in JAX via callable PyTrees and filtered transformations
Patrick Kidger
Cristian Garcia
275
191
0
30 Oct 2021
Sinkformers: Transformers with Doubly Stochastic Attention
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Michael E. Sander
Pierre Ablin
Mathieu Blondel
Gabriel Peyré
254
115
0
22 Oct 2021
Chaos as an interpretable benchmark for forecasting and data-driven modelling
W. Gilpin
AI4TS
293
106
0
11 Oct 2021
Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems
Subhabrata Dutta
Tanya Gautam
Soumen Chakrabarti
Tanmoy Chakraborty
312
24
0
30 Sep 2021
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation
Bei Li
Quan Du
Tao Zhou
Shuhan Zhou
Xin Zeng
Tong Xiao
Jingbo Zhu
192
24
0
06 Apr 2021
Transformer Interpretability Beyond Attention Visualization
Computer Vision and Pattern Recognition (CVPR), 2020
Hila Chefer
Shir Gur
Lior Wolf
421
864
0
17 Dec 2020
Score-Based Generative Modeling through Stochastic Differential Equations
International Conference on Learning Representations (ICLR), 2020
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffM
SyDa
2.2K
8,890
0
26 Nov 2020
Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients
Mathematical and Scientific Machine Learning (MSML), 2020
Yifei Huang
Yaodong Yu
Hongyang R. Zhang
Yi-An Ma
Xingtai Lv
AAML
187
31
0
28 Sep 2020
On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems Tools
Frontiers in Applied Mathematics and Statistics (FAMS), 2020
Ryan H. Vogt
M. P. Touzel
Eli Shlizerman
Guillaume Lajoie
213
53
0
25 Jun 2020
An Ode to an ODE
K. Choromanski
Jared Davis
Valerii Likhosherstov
Xingyou Song
Jean-Jacques E. Slotine
Jacob Varley
Honglak Lee
Adrian Weller
Vikas Sindhwani
255
32
0
19 Jun 2020
Language Models are Few-Shot Learners
Neural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.0K
52,836
0
28 May 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
1.8K
6,691
0
23 Jan 2020
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
International Conference on Learning Representations (ICLR), 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
1.2K
7,141
0
26 Sep 2019
Deep Equilibrium Models
Neural Information Processing Systems (NeurIPS), 2019
Shaojie Bai
J. Zico Kolter
V. Koltun
224
773
0
03 Sep 2019
Attention is not not Explanation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Sarah Wiegreffe
Yuval Pinter
XAI
AAML
FAtt
473
1,028
0
13 Aug 2019
ANODEV2: A Coupled Neural ODE Evolution Framework
Tianjun Zhang
Z. Yao
A. Gholami
Kurt Keutzer
Joseph E. Gonzalez
George Biros
Michael W. Mahoney
176
41
0
10 Jun 2019
Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit
Belinda Tzen
Maxim Raginsky
DiffM
426
239
0
23 May 2019
Attention is not Explanation
North American Chapter of the Association for Computational Linguistics (NAACL), 2019
Sarthak Jain
Byron C. Wallace
FAtt
1.1K
1,534
0
26 Feb 2019
Neural Ordinary Differential Equations
T. Chen
Yulia Rubanova
J. Bettencourt
David Duvenaud
AI4CE
1.2K
6,219
0
19 Jun 2018
Measuring the Intrinsic Dimension of Objective Landscapes
International Conference on Learning Representations (ICLR), 2018
Chunyuan Li
Heerad Farkhoor
Rosanne Liu
J. Yosinski
305
480
0
24 Apr 2018
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
4.2K
162,388
0
12 Jun 2017
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
1.1K
3,505
0
26 Sep 2016
Memory-Efficient Backpropagation Through Time
A. Gruslys
Rémi Munos
Ivo Danihelka
Marc Lanctot
Alex Graves
200
259
0
10 Jun 2016
Training Deep Nets with Sublinear Memory Cost
Tianqi Chen
Bing Xu
Chiyuan Zhang
Carlos Guestrin
494
1,352
0
21 Apr 2016
1