Uncovering hidden geometry in Transformers via disentangling position and context

7 October 2023

Papers citing "Uncovering hidden geometry in Transformers via disentangling position and context"

13 / 13 papers shown

Title
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries Neil He Jiahong Liu Buze Zhang N. Bui Ali Maatouk Menglin Yang Irwin King Melanie Weber Rex Ying 27 0 0 11 Apr 2025
Context-aware Biases for Length Extrapolation Ali Veisi Amir Mansourian 50 0 0 11 Mar 2025
Lines of Thought in Large Language Models Raphael Sarfati Toni J. B. Liu Nicolas Boullé Christopher Earls LRM VLM LM&Ro 58 1 0 17 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song Zhuoyan Xu Yiqiao Zhong 75 4 0 31 Dec 2024
Reasoning in Large Language Models: A Geometric Perspective Romain Cosentino Sarath Shekkizhar LRM 42 2 0 02 Jul 2024
Transformer Normalisation Layers and the Independence of Semantic Subspaces S. Menary Samuel Kaski Andre Freitas 36 2 0 25 Jun 2024
An Information-Theoretic Analysis of In-Context Learning Hong Jun Jeon Jason D. Lee Qi Lei Benjamin Van Roy 15 18 0 28 Jan 2024
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation Randall Balestriero Romain Cosentino Sarath Shekkizhar 17 2 0 04 Dec 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 209 2,232 0 22 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 453 0 24 Sep 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 242 690 0 27 Aug 2021
Topic Modeling with Contextualized Word Representation Clusters Laure Thompson David M. Mimno 94 82 0 23 Oct 2020