Designing and Interpreting Probes with Control Tasks

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

8 September 2019

John Hewitt

Abigail Z. Jacobs

ArXiv (abs)PDF HTML

Papers citing "Designing and Interpreting Probes with Control Tasks"

50 / 381 papers shown

Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders

21 Nov 2025

Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks

Éloïse Benito-Rodriguez

Einar Urdshals

Jasmina Nasufi

Nicky Pochinkov

20 Nov 2025

Spectral Identifiability for Interpretable Probe Geometry

William Hao-Cheng Huang

20 Nov 2025

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

...

121

21 Oct 2025

When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity

117

20 Oct 2025

Inverse-Free Wilson Loops for Transformers: A Practical Diagnostic for Invariance and Order Sensitivity

Edward Y. Chang

Ethan Chang

09 Oct 2025

Type and Complexity Signals in Multilingual Question Representations

Robin Kokot

Wessel Poelman

104

07 Oct 2025

Controllable Stylistic Text Generation with Train-Time Attribute-Regularized Diffusion

113

07 Oct 2025

Probing the Difficulty Perception Mechanism of Large Language Models

207

07 Oct 2025

Modeling Student Learning with 3.8 Million Program Traces

06 Oct 2025

Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional AttentionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

145

02 Oct 2025

From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens

...

106

02 Oct 2025

Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling

144

01 Oct 2025

Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

132

30 Sep 2025

Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT

119

30 Sep 2025

Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

108

29 Sep 2025

Language Model Planning from an Information Theoretic Perspective

Muhammed Ustaomeroglu

125

28 Sep 2025

Towards Transparent AI: A Survey on Explainable Language Models

Avash Palikhe

Sribala Vidyadhari Chinta

174

25 Sep 2025

A Pipeline to Assess Merging Methods via Behavior and Internals

Yutaro Sigris

Andreas Waldis

MoMe

283

23 Sep 2025

Do Natural Language Descriptions of Model Activations Convey Privileged Information?

Millicent Li

Alberto Mario Ceballos Arroyo

Giordano Rogers

Naomi Saphra

Byron C. Wallace

160

16 Sep 2025

Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories

179

04 Sep 2025

Tracking World States with Language Models: State-Based Evaluation Using Chess

27 Aug 2025

What Does it Mean for a Neural Network to Learn a "World Model"?

116

29 Jul 2025

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

188

11 Jul 2025

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

621

09 Jul 2025

Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations

Ananth Agarwal

Jasper Jian

Christopher D. Manning

Shikhar Murty

239

20 Jun 2025

Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL

131

10 Jun 2025

Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter EraAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Dan Oneaţă

Desmond Elliott

Stella Frank

187

04 Jun 2025

Echoes of BERT: Do Modern Language Models Rediscover the Classical NLP Pipeline?

Michael Li

Nishant Subramani

KELM

228

02 Jun 2025

Different Speech Translation Models Encode and Translate Speaker Gender DifferentlyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

225

02 Jun 2025

Understanding the learned look-ahead behavior of chess neural networks

Diogo Cruz

312

26 May 2025

Large Language Models Do Multi-Label Classification Differently

Marcus Ma

Georgios Chochlakis

Niyantha Maruthu Pandiyan

Jesse Thomason

Zengyi Qin

313

23 May 2025

Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization

Vera Neplenbroek

Arianna Bisazza

Raquel Fernández

308

22 May 2025

Probing Subphonemes in Morphology ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Gal Astrach

Yuval Pinter

279

16 May 2025

Designing and Contextualising Probes for African Languages

Wisdom Aduah

Francois Meyer

371

15 May 2025

Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations

Yize Zhao

Christos Thrampoulidis

277

13 May 2025

Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models

183

17 Apr 2025

Probing then Editing Response Personality of Large Language Models

381

14 Apr 2025

Linguistic Interpretability of Transformer-based Language Models: a systematic review

Lucía Pitarch-Ballesteros

Emma Anglés-Herrero

VLM

357

09 Apr 2025

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

425

28 Mar 2025

Construction Identification and Disambiguation Using BERT: A Case Study of NPN

Wesley Scivetti

Nathan Schneider

294

24 Mar 2025

Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions

414

18 Mar 2025

Aligned Probing: Relating Toxic Behavior and Model Internals

282

17 Mar 2025

Queueing, Predictions, and LLMs: Challenges and Open Problems

Michael Mitzenmacher

Rana Shahout

AI4TS LRM

211

10 Mar 2025

Constructions are Revealed in Word Distributions

346

08 Mar 2025

Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models

410

03 Mar 2025

A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

292

21 Feb 2025

Language Models Can Predict Their Own Behavior

Dhananjay Ashok

Jonathan May

AI4TS ReLM LRM

422

18 Feb 2025

We Can't Understand AI Using our Existing Vocabulary

John Hewitt

Robert Geirhos

Been Kim

307

11 Feb 2025

Mechanistic Interpretability of Emotion Inference in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

296

08 Feb 2025

All Papers

Designing and Interpreting Probes with Control Tasks

Papers citing "Designing and Interpreting Probes with Control Tasks"