How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

7 March 2023

Andrej Risteski

Papers citing "How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding"

11 / 11 papers shown

Title
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers Hongkang Li Yihua Zhang Shuai Zhang M. Wang Sijia Liu Pin-Yu Chen MoMe 43 2 0 15 Apr 2025
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study Yang Xu Y. Wang Hao Wang 51 1 0 23 Dec 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency Kaiyue Wen Huaqing Zhang Hongzhou Lin Jingzhao Zhang MoE LRM 45 2 0 07 Oct 2024
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization Xinhao Yao Hongjin Qian Xiaolin Hu Gengze Xu Yong Liu Wei Liu Jian Luan Bin Wang 31 0 0 03 Oct 2024
Attention layers provably solve single-location regression P. Marion Raphael Berthier Gérard Biau Claire Boyer 33 2 0 02 Oct 2024
An Information-Theoretic Analysis of In-Context Learning Hong Jun Jeon Jason D. Lee Qi Lei Benjamin Van Roy 8 18 0 28 Jan 2024
Learning to forecast diagnostic parameters using pre-trained weather embedding Peetak Mitra Vivek Ramavajjala 12 1 0 01 Dec 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods Fred Zhang Neel Nanda LLMSV 10 95 0 27 Sep 2023
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 15 36 0 14 Dec 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 219 291 0 24 Feb 2021
Topic Modeling with Contextualized Word Representation Clusters Laure Thompson David M. Mimno 75 69 0 23 Oct 2020