Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.04861
Cited By
Uncovering hidden geometry in Transformers via disentangling position and context
7 October 2023
Jiajun Song
Yiqiao Zhong
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Uncovering hidden geometry in Transformers via disentangling position and context"
13 / 13 papers shown
Title
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries
Neil He
Jiahong Liu
Buze Zhang
N. Bui
Ali Maatouk
Menglin Yang
Irwin King
Melanie Weber
Rex Ying
27
0
0
11 Apr 2025
Context-aware Biases for Length Extrapolation
Ali Veisi
Amir Mansourian
50
0
0
11 Mar 2025
Lines of Thought in Large Language Models
Raphael Sarfati
Toni J. B. Liu
Nicolas Boullé
Christopher Earls
LRM
VLM
LM&Ro
58
1
0
17 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
75
4
0
31 Dec 2024
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
42
2
0
02 Jul 2024
Transformer Normalisation Layers and the Independence of Semantic Subspaces
S. Menary
Samuel Kaski
Andre Freitas
36
2
0
25 Jun 2024
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
15
18
0
28 Jan 2024
Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation
Randall Balestriero
Romain Cosentino
Sarath Shekkizhar
17
2
0
04 Dec 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
209
2,232
0
22 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
242
690
0
27 Aug 2021
Topic Modeling with Contextualized Word Representation Clusters
Laure Thompson
David M. Mimno
94
82
0
23 Oct 2020
1