How do Language Models Bind Entities in Context?

26 October 2023

Papers citing "How do Language Models Bind Entities in Context?"

33 / 33 papers shown

Title
Understanding In-context Learning of Addition via Activation Subspaces Xinyan Hu Kayo Yin Michael I. Jordan Jacob Steinhardt Lijie Chen 51 0 0 08 May 2025
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure Boshi Wang Huan Sun 34 2 0 02 Apr 2025
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Guy Kaplan Michael Toker Yuval Reif Yonatan Belinkov Roy Schwartz DiffM 48 0 0 01 Apr 2025
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models Yukang Yang Declan Campbell Kaixuan Huang Mengdi Wang Jonathan D. Cohen Taylor W. Webb LRM 63 2 0 27 Feb 2025
Language Models' Factuality Depends on the Language of Inquiry Tushar Aggarwal Kumar Tanmay Ayush Agrawal Kumar Ayush Hamid Palangi Paul Pu Liang HILM KELM 71 0 0 25 Feb 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization Grace Guinan Addison Salvador Michelle A. Smeaton Andrew Glaws Hilary Egan Brian C. Wyatt Babak Anasori K. Fiedler M. Olszta Steven Spurgeon 63 0 0 25 Feb 2025
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis Guan Zhe Hong Nishanth Dikkala Enming Luo Cyrus Rashtchian Xin Wang Rina Panigrahy OffRL LRM NAI 29 0 0 06 Nov 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo Druv Pai Yu Bai Jiantao Jiao Michael I. Jordan Song Mei 29 9 0 17 Oct 2024
Unlearning-based Neural Interpretations Ching Lam Choi Alexandre Duplessis Serge Belongie FAtt 42 0 0 10 Oct 2024
Learning Semantic Structure through First-Order-Logic Translation Akshay Chaturvedi Nicholas Asher LRM 18 0 0 04 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models Michael A. Lepori Michael Mozer Asma Ghandeharioun LRM 80 1 0 02 Oct 2024
Representational Analysis of Binding in Language Models Qin Dai Benjamin Heinzerling Kentaro Inui 29 0 0 09 Sep 2024
Can Transformers Do Enumerative Geometry? Baran Hashemi Roderic G. Corominas Alessandro Giacchetto 32 2 0 27 Aug 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience Zhonghao He Jascha Achterberg Katie Collins Kevin K. Nejad Danyal Akarca ... Chole Li Kai J. Sandbrink Stephen Casper Anna Ivanova Grace W. Lindsay AI4CE 28 1 0 22 Aug 2024
Relational Composition in Neural Networks: A Survey and Call to Action Martin Wattenberg Fernanda Viégas CoGe 36 9 0 19 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 75 19 0 02 Jul 2024
Monitoring Latent World States in Language Models with Propositional Probes Jiahai Feng Stuart Russell Jacob Steinhardt HILM 32 6 0 27 Jun 2024
The Remarkable Robustness of LLMs: Stages of Inference? Vedang Lad Wes Gurnee Max Tegmark 33 33 0 27 Jun 2024
Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects Michael A. Lepori Alexa R. Tartaglini Wai Keen Vong Thomas Serre Brenden Lake Ellie Pavlick 34 2 0 22 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models Jack Merullo Carsten Eickhoff Ellie Pavlick 56 13 0 13 Jun 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Boshi Wang Xiang Yue Yu-Chuan Su Huan Sun LRM 29 41 0 23 May 2024
How to use and interpret activation patching Stefan Heimersheim Neel Nanda 25 37 0 23 Apr 2024
Mechanistic Interpretability for AI Safety -- A Review Leonard Bereska E. Gavves AI4CE 38 111 0 22 Apr 2024
Evidence from counterfactual tasks supports emergent analogical reasoning in large language models Taylor W. Webb K. Holyoak Hongjing Lu LRM ELM 33 4 0 14 Apr 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to components János Kramár Tom Lieberum Rohin Shah Neel Nanda KELM 43 42 0 01 Mar 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Jing-ling Huang Zhengxuan Wu Christopher Potts Mor Geva Atticus Geiger 55 26 0 27 Feb 2024
Rethinking Interpretability in the Era of Large Language Models Chandan Singh J. Inala Michel Galley Rich Caruana Jianfeng Gao LRM AI4CE 75 61 0 30 Jan 2024
Universal Neurons in GPT2 Language Models Wes Gurnee Theo Horsley Zifan Carl Guo Tara Rezaei Kheirkhah Qinyi Sun Will Hathaway Neel Nanda Dimitris Bertsimas MILM 92 37 0 22 Jan 2024
Entity Tracking in Language Models Najoung Kim Sebastian Schuster 50 16 0 03 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 261 0 28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 494 0 01 Nov 2022
Natural Language Descriptions of Deep Visual Features Evan Hernandez Sarah Schwettmann David Bau Teona Bagashvili Antonio Torralba Jacob Andreas MILM 196 116 0 26 Jan 2022
Unsolved Problems in ML Safety Dan Hendrycks Nicholas Carlini John Schulman Jacob Steinhardt 173 273 0 28 Sep 2021