ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.04245
  4. Cited By
How Do Transformers Learn Topic Structure: Towards a Mechanistic
  Understanding

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

7 March 2023
Yuchen Li
Yuan-Fang Li
Andrej Risteski
ArXivPDFHTML

Papers citing "How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding"

11 / 11 papers shown
Title
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
M. Wang
Sijia Liu
Pin-Yu Chen
MoMe
43
2
0
15 Apr 2025
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Yang Xu
Y. Wang
Hao Wang
51
1
0
23 Dec 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
45
2
0
07 Oct 2024
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Xinhao Yao
Hongjin Qian
Xiaolin Hu
Gengze Xu
Yong Liu
Wei Liu
Jian Luan
Bin Wang
31
0
0
03 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
33
2
0
02 Oct 2024
An Information-Theoretic Analysis of In-Context Learning
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
8
18
0
28 Jan 2024
Learning to forecast diagnostic parameters using pre-trained weather
  embedding
Learning to forecast diagnostic parameters using pre-trained weather embedding
Peetak Mitra
Vivek Ramavajjala
12
1
0
01 Dec 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
10
95
0
27 Sep 2023
Learning threshold neurons via the "edge of stability"
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
15
36
0
14 Dec 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
219
291
0
24 Feb 2021
Topic Modeling with Contextualized Word Representation Clusters
Topic Modeling with Contextualized Word Representation Clusters
Laure Thompson
David M. Mimno
75
69
0
23 Oct 2020
1