Birth of a Transformer: A Memory Viewpoint

1 June 2023

Papers citing "Birth of a Transformer: A Memory Viewpoint"

50 / 67 papers shown

Title
Understanding In-context Learning of Addition via Activation Subspaces Xinyan Hu Kayo Yin Michael I. Jordan Jacob Steinhardt Lijie Chen 53 0 0 08 May 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism Aviv Bick Eric P. Xing Albert Gu RALM 88 0 0 22 Apr 2025
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Ali Behrouz Meisam Razaviyayn Peilin Zhong Vahab Mirrokni 41 0 0 17 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression Yuling Jiao Yanming Lai Defeng Sun Yang Wang Bokai Yan 29 0 0 16 Apr 2025
Taming Knowledge Conflicts in Language Models Gaotang Li Yuzhong Chen Hanghang Tong KELM 49 1 0 14 Mar 2025
Real-Time Personalization with Simple Transformers Lin An Andrew A. Li Vaisnavi Nemala Gabriel Visotsky 34 0 0 01 Mar 2025
Hyperspherical Energy Transformer with Recurrent Depth Yunzhe Hu Difan Zou Dong Xu 48 0 0 17 Feb 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? Yutong Yin Zhaoran Wang LRM ReLM 143 0 0 27 Jan 2025
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing Keltin Grimes Marco Christiani David Shriver Marissa Connor KELM 80 1 0 17 Dec 2024
Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory Shuo Wang Issei Sato 76 0 0 16 Dec 2024
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models Yunzhe Hu Difan Zou Dong Xu 74 1 0 26 Nov 2024
Leveraging Large Language Models for Enhancing Public Transit Services Jiahao Wang Amer Shalaby 29 0 0 18 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo Druv Pai Yu Bai Jiantao Jiao Michael I. Jordan Song Mei 29 10 0 17 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent Bo Chen Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao-quan Song 96 19 0 15 Oct 2024
Learning Linear Attention in Polynomial Time Morris Yau Ekin Akyürek Jiayuan Mao Joshua B. Tenenbaum Stefanie Jegelka Jacob Andreas 19 2 0 14 Oct 2024
Zero-Shot Generalization of Vision-Based RL Without Data Augmentation S. Batra Gaurav Sukhatme OffRL DRL 31 1 0 09 Oct 2024
Transformers learn variable-order Markov chains in-context Ruida Zhou C. Tian Suhas Diggavi 26 0 0 07 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation Chuanyang Zheng Yihang Gao Han Shi Jing Xiong Jiankai Sun ... Xiaozhe Ren Michael Ng Xin Jiang Zhenguo Li Yu Li 31 2 0 07 Oct 2024
Density estimation with LLMs: a geometric investigation of in-context learning trajectories Toni J. B. Liu Nicolas Boullé Raphael Sarfati Christopher Earls 28 0 0 07 Oct 2024
Large Language Models as Markov Chains Oussama Zekri Ambroise Odonnat Abdelhakim Benechehab Linus Bleistein Nicolas Boullé I. Redko 42 10 0 03 Oct 2024
Attention layers provably solve single-location regression P. Marion Raphael Berthier Gérard Biau Claire Boyer 140 2 0 02 Oct 2024
Attention Heads of Large Language Models: A Survey Zifan Zheng Yezhaohui Wang Yuxin Huang Shichao Song Mingchuan Yang Bo Tang Feiyu Xiong Zhiyu Li LRM 58 22 0 05 Sep 2024
One-layer transformers fail to solve the induction heads task Clayton Sanford Daniel J. Hsu Matus Telgarsky 35 8 0 26 Aug 2024
Spin glass model of in-context learning Yuhao Li Ruoran Bai Haiping Huang LRM 42 0 0 05 Aug 2024
MCGMark: An Encodable and Robust Online Watermark for Tracing LLM-Generated Malicious Code Peng Ding Jingyu Wu Qingyuan Zhong Dan Ma Xunliang Cai ... Shi Chen Weizhe Zhang Zibin Zheng Weizhe Zhang Zibin Zheng 48 0 0 02 Aug 2024
Transformers on Markov Data: Constant Depth Suffices Nived Rajaraman Marco Bondaschi Kannan Ramchandran Michael C. Gastpar Ashok Vardhan Makkuva 45 4 0 25 Jul 2024
Empirical Capacity Model for Self-Attention Neural Networks Aki Härmä M. Pietrasik Anna Wilbik 36 1 0 22 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers Yibo Jiang Goutham Rajendran Pradeep Ravikumar Bryon Aragam CLL KELM 37 6 0 26 Jun 2024
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers Brian K Chen Tianyang Hu Hui Jin Hwee Kuan Lee Kenji Kawaguchi 50 0 0 05 Jun 2024
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task Siavash Golkar Alberto Bietti Mariel Pettee Michael Eickenberg M. Cranmer ... Ruben Ohana Liam Parker Bruno Régaldo-Saint Blancard Kyunghyun Cho Shirley Ho 47 1 0 30 May 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners Shuyang Jiang Yusheng Liao Ya-Qin Zhang Yu Wang Yanfeng Wang 29 3 0 30 May 2024
Why Larger Language Models Do In-context Learning Differently? Zhenmei Shi Junyi Wei Zhuoyan Xu Yingyu Liang 37 18 0 30 May 2024
CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration Sophie Jaffard Samuel Vaiter Patricia Reynaud-Bouret 73 0 0 29 May 2024
IM-Context: In-Context Learning for Imbalanced Regression Tasks Ismail Nejjar Faez Ahmed Olga Fink 29 1 0 28 May 2024
Asymptotic theory of in-context learning by linear attention Yue M. Lu Mary I. Letey Jacob A. Zavatone-Veth Anindita Maiti C. Pehlevan 29 10 0 20 May 2024
Memory Mosaics Jianyu Zhang Niklas Nolte Ranajoy Sadhukhan Beidi Chen Léon Bottou VLM 70 3 0 10 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics Hanlin Zhu Baihe Huang Shaolun Zhang Michael I. Jordan Jiantao Jiao Yuandong Tian Stuart Russell LRM AI4CE 49 13 0 07 May 2024
BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model Chenwei Xu Yu-Chao Huang Jerry Yao-Chieh Hu Weijian Li Ammar Gilani H. Goan Han Liu 52 19 0 04 Apr 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models Jerry Yao-Chieh Hu Pei-Hsuan Chang Haozheng Luo Hong-Yu Chen Weijian Li Wei-Po Wang Han Liu 39 26 0 04 Apr 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models Carlo Nicolini Jacopo Staiano Bruno Lepri Raffaele Marino MoE 28 1 0 13 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models Frederik Kunstner Robin Yadav Alan Milligan Mark Schmidt Alberto Bietti 39 26 0 29 Feb 2024
Learning Associative Memories with Gradient Descent Vivien A. Cabannes Berfin Simsek A. Bietti 38 6 0 28 Feb 2024
Prospector Heads: Generalized Feature Attribution for Large Models & Data Gautam Machiraju Alexander Derry Arjun D Desai Neel Guha Amir-Hossein Karimi James Zou Russ Altman Christopher Ré Parag Mallick AI4TS MedIm 48 0 0 18 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Benjamin L. Edelman Ezra Edelman Surbhi Goel Eran Malach Nikolaos Tsilivis BDL 26 42 0 16 Feb 2024
Understanding In-Context Learning with a Pelican Soup Framework Ting-Rui Chiang Dani Yogatama 16 1 0 16 Feb 2024
Secret Collusion among Generative AI Agents: Multi-Agent Deception via Steganography S. Motwani Mikhail Baranchuk Martin Strohmeier Vijay Bolina Philip H. S. Torr Lewis Hammond Christian Schroeder de Witt 42 4 0 12 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 34 13 0 08 Feb 2024
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains Ashok Vardhan Makkuva Marco Bondaschi Adway Girish Alliot Nagle Martin Jaggi Hyeji Kim Michael C. Gastpar OffRL 18 25 0 06 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention Hugo Cui Freya Behrens Florent Krzakala Lenka Zdeborová MLT 33 11 0 06 Feb 2024