Language Models Represent Space and Time

3 October 2023

Papers citing "Language Models Represent Space and Time"

44 / 44 papers shown

Title
Revealing economic facts: LLMs know more than they say Marcus Buckmann Quynh Anh Nguyen Edward Hill 21 0 0 13 May 2025
Geospatial Mechanistic Interpretability of Large Language Models Stef De Sabbata Stefano Mizzaro Kevin Roitero AI4CE 28 0 0 06 May 2025
Do Large Language Models know who did what to whom? Joseph M. Denning Xiaohan Bryor Snefjella Idan A. Blank 50 1 0 23 Apr 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful Iván Arcuschin Jett Janiak Robert Krzyzanowski Senthooran Rajamanoharan Neel Nanda Arthur Conmy LRM ReLM 62 6 0 11 Mar 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems Richard Ren Arunim Agarwal Mantas Mazeika Cristina Menghini Robert Vacareanu ... Matias Geralnik Adam Khoja Dean Lee Summer Yue Dan Hendrycks HILM ALM 88 0 0 05 Mar 2025
Towards Understanding Distilled Reasoning Models: A Representational Approach David D. Baek Max Tegmark LRM 75 2 0 05 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models Junsol Kim James Evans Aaron Schein 75 2 0 03 Mar 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer Marthe Ballon Andres Algaba Vincent Ginis LRM ReLM 36 4 0 24 Feb 2025
Lines of Thought in Large Language Models Raphael Sarfati Toni J. B. Liu Nicolas Boullé Christopher Earls LRM VLM LM&Ro 58 1 0 17 Feb 2025
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis Ge Lei Samuel J. Cooper KELM 47 0 0 15 Feb 2025
Human-like conceptual representations emerge from language prediction Ningyu Xu Qi Zhang Chao Du Qiang Luo Xipeng Qiu Xuanjing Huang Menghan Zhang 70 0 0 21 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction K. M. Collins Umang Bhatt Ilia Sucholutsky 42 1 0 16 Jan 2025
ICLR: In-Context Learning of Representations Core Francisco Park Andrew Lee Ekdeep Singh Lubana Yongyi Yang Maya Okawa Kento Nishi Martin Wattenberg Hidenori Tanaka AIFin 114 3 0 29 Dec 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit Zeqing He Zhibo Wang Zhixuan Chu Huiyu Xu Rui Zheng Kui Ren Chun Chen 52 3 0 17 Nov 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling Emanuele Marconato Sébastien Lachapelle Sebastian Weichwald Luigi Gresele 64 3 0 30 Oct 2024
Improving Instruction-Following in Language Models through Activation Steering Alessandro Stolfo Vidhisha Balachandran Safoora Yousefi Eric Horvitz Besmira Nushi LLMSV 52 14 0 15 Oct 2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure Yuxiao Li Eric J. Michaud David D. Baek Joshua Engels Xiaoqing Sun Max Tegmark 50 7 0 10 Oct 2024
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax I. Butakov Alexander Sememenko Alexander Tolmachev Andrey Gladkov Marina Munkhoeva Alexey Frolov 32 0 0 09 Oct 2024
Programming Refusal with Conditional Activation Steering Bruce W. Lee Inkit Padhi K. Ramamurthy Erik Miehling Pierre L. Dognin Manish Nagireddy Amit Dhurandhar LLMSV 91 13 0 06 Sep 2024
Understanding Generative AI Content with Embedding Models Max Vargas Reilly Cannon A. Engel Anand D. Sarwate Tony Chiang 47 3 0 19 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 75 19 0 02 Jul 2024
Monitoring Latent World States in Language Models with Propositional Probes Jiahai Feng Stuart Russell Jacob Steinhardt HILM 32 6 0 27 Jun 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations Teun van der Weij Felix Hofstätter Ollie Jaffe Samuel F. Brown Francis Rhys Ward ELM 37 23 0 11 Jun 2024
Feature contamination: Neural networks learn uncorrelated features and fail to generalize Tianren Zhang Chujie Zhao Guanyu Chen Yizhou Jiang Feng Chen OOD MLT OODD 69 3 0 05 Jun 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models Kiho Park Yo Joong Choe Yibo Jiang Victor Veitch 50 25 0 03 Jun 2024
Survival of the Fittest Representation: A Case Study with Modular Addition Xiaoman Delores Ding Zifan Carl Guo Eric J. Michaud Ziming Liu Max Tegmark 34 3 0 27 May 2024
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability Shenyuan Gao Jiazhi Yang Li Chen Kashyap Chitta Yihang Qiu Andreas Geiger Jun Zhang Hongyang Li 60 75 0 27 May 2024
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models Bernal Jiménez Gutiérrez Yiheng Shu Yu Gu Michihiro Yasunaga Yu-Chuan Su RALM CLL 60 33 0 23 May 2024
The Unreasonable Ineffectiveness of the Deeper Layers Andrey Gromov Kushal Tirumala Hassan Shapourian Paolo Glorioso Daniel A. Roberts 41 79 0 26 Mar 2024
Language Models Represent Beliefs of Self and Others Wentao Zhu Zhining Zhang Yizhou Wang MILM LRM 38 7 0 28 Feb 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 56 17 0 28 Feb 2024
Opening the AI black box: program synthesis via mechanistic interpretability Eric J. Michaud Isaac Liao Vedang Lad Ziming Liu Anish Mudide Chloe Loughridge Zifan Carl Guo Tara Rezaei Kheirkhah Mateja Vukelić Max Tegmark 23 12 0 07 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models Weijiao Zhang Jindong Han Zhao Xu Hang Ni Hao Liu Hui Xiong Hui Xiong AI4CE 77 15 0 30 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 13 76 0 25 Jan 2024
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B Simon Lermen Charlie Rogers-Smith Jeffrey Ladish ALM 15 80 0 31 Oct 2023
On Generative Agents in Recommendation An Zhang Yuxin Chen Leheng Sheng Xiang Wang Tat-Seng Chua 30 43 0 16 Oct 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 153 186 0 02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model Michael Hanna Ollie Liu Alexandre Variengien LRM 189 119 0 30 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 261 0 28 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 251 2,232 0 22 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 494 0 01 Nov 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 120 316 0 21 Sep 2022
Do Language Models Know the Way to Rome? Bastien Liétard Mostafa Abdou Anders Søgaard 46 15 0 16 Sep 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 224 404 0 24 Feb 2021