Reverse-Engineering the Retrieval Process in GenIR ModelsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025

Anja Reusch

Yonatan Belinkov

RALM

211

25 Mar 2025

Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models

233

15 Feb 2025

Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Shahar Katz

Lior Wolf

144

22 Dec 2024

Revealing the Barriers of Language Agents in PlanningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Kai Zhang

258

16 Oct 2024

Optimal ablation for interpretabilityNeural Information Processing Systems (NeurIPS), 2024

Maximilian Li

Lucas Janson

FAtt

343

16 Sep 2024

Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

Weiping Wang

428

27 Aug 2024

On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

Nitay Calderon

Roi Reichart

358

27 Jul 2024

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Oyvind Tafjord

221

21 Jul 2024

Confidence Regulation Neurons in Language Models

242

24 Jun 2024

Finding Transformer Circuits with Edge Pruning

Adithya Bhaskar

Alexander Wettig

Dan Friedman

Danqi Chen

468

24 Jun 2024

Knowledge Circuits in Pretrained Transformers

Ningyu Zhang

Shumin Deng

Huajun Chen

KELM

436

28 May 2024

InversionView: A General-Purpose Method for Reading Information from Neural Activations

354

27 May 2024

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

Igor Tufanov

232

10 Apr 2024

Diffusion Lens: Interpreting Text Encoders in Text-to-Image PipelinesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

284

09 Mar 2024

Understanding and Patching Compositional Reasoning in LLMs

Defu Lian

247

22 Feb 2024

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

269

20 Feb 2024

A Comprehensive Study of Knowledge Editing for Large Language Models

Ningyu Zhang

Yunzhi Yao

Bo Tian

Peng Wang

Shumin Deng

...

Lei Liang

Huajun Chen

493

126

02 Jan 2024

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

2.5K

19,805

16 Feb 2016