ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.03368
  4. Cited By
Designing and Interpreting Probes with Control Tasks

Designing and Interpreting Probes with Control Tasks

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
8 September 2019
John Hewitt
Abigail Z. Jacobs
ArXiv (abs)PDFHTML

Papers citing "Designing and Interpreting Probes with Control Tasks"

50 / 381 papers shown
Title
Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders
Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders
Samuel Stevens
Jacob Beattie
T. Berger-Wolf
Yu-Chuan Su
88
0
0
21 Nov 2025
Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks
Éloïse Benito-Rodriguez
Einar Urdshals
Jasmina Nasufi
Nicky Pochinkov
88
0
0
20 Nov 2025
Spectral Identifiability for Interpretable Probe Geometry
William Hao-Cheng Huang
97
0
0
20 Nov 2025
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
Xue Jiang
Yihong Dong
Mengyang Liu
Hongyi Deng
Tian Wang
...
Zhi Jin
Wenpin Jiao
Fei Huang
Yongbin Li
Ge Li
109
1
0
21 Oct 2025
When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity
When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity
Nisrine Rair
Alban Goupil
Valeriu Vrabie
Emmanuel Chochoy
117
0
0
20 Oct 2025
Inverse-Free Wilson Loops for Transformers: A Practical Diagnostic for Invariance and Order Sensitivity
Inverse-Free Wilson Loops for Transformers: A Practical Diagnostic for Invariance and Order Sensitivity
Edward Y. Chang
Ethan Chang
92
1
0
09 Oct 2025
Type and Complexity Signals in Multilingual Question Representations
Type and Complexity Signals in Multilingual Question Representations
Robin Kokot
Wessel Poelman
104
0
0
07 Oct 2025
Controllable Stylistic Text Generation with Train-Time Attribute-Regularized Diffusion
Controllable Stylistic Text Generation with Train-Time Attribute-Regularized Diffusion
Fan Zhou
Chang Tian
Tim Van de Cruys
DiffM
105
1
0
07 Oct 2025
Probing the Difficulty Perception Mechanism of Large Language Models
Probing the Difficulty Perception Mechanism of Large Language Models
Sunbowen Lee
Qingyu Yin
Chak Tou Leong
Jialiang Zhang
Yicheng Gong
Shiwen Ni
Min Yang
Xiaoyu Shen
LRM
203
0
0
07 Oct 2025
Modeling Student Learning with 3.8 Million Program Traces
Modeling Student Learning with 3.8 Million Program Traces
Alexis Ross
Megha Srivastava
Jeremiah Blanchard
Jacob Andreas
92
3
0
06 Oct 2025
Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention
Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional AttentionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhaoxin Feng
Jianfei Ma
Emmanuele Chersoni
Xiaojing Zhao
Xiaoyi Bao
141
3
0
02 Oct 2025
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens
Hala Sheta
Eric Huang
Shuyu Wu
Ilia Alenabi
Jiajun Hong
...
D. Wei
Jialin Yang
Jiawei Zhou
Ziqiao Ma
Freda Shi
VLM
106
3
0
02 Oct 2025
Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling
Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling
Federico Tiblias
Irina Bigoulaeva
Jingcheng Niu
Simone Balloccu
Iryna Gurevych
140
0
0
01 Oct 2025
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
James Oldfield
Juil Sock
Ioannis Patras
Adel Bibi
Fazl Barez
124
0
0
30 Sep 2025
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Guy Bar-Shalom
Fabrizio Frasca
Yaniv Galron
Yftah Ziser
Haggai Maron
MLLM
115
0
0
30 Sep 2025
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
Marco Bronzini
Carlo Nicolini
Bruno Lepri
Jacopo Staiano
Andrea Passerini
LLMSV
104
0
0
29 Sep 2025
Language Model Planning from an Information Theoretic Perspective
Language Model Planning from an Information Theoretic Perspective
Muhammed Ustaomeroglu
Baris Askin
Gauri Joshi
Carlee Joe-Wong
Guannan Qu
117
0
0
28 Sep 2025
Towards Transparent AI: A Survey on Explainable Language Models
Towards Transparent AI: A Survey on Explainable Language Models
Avash Palikhe
Sribala Vidyadhari Chinta
Zhipeng Yin
Rui Guo
Qiang Duan
Jie Yang
Wenbin Zhang
170
2
0
25 Sep 2025
A Pipeline to Assess Merging Methods via Behavior and Internals
A Pipeline to Assess Merging Methods via Behavior and Internals
Yutaro Sigris
Andreas Waldis
MoMe
275
0
0
23 Sep 2025
Do Natural Language Descriptions of Model Activations Convey Privileged Information?
Do Natural Language Descriptions of Model Activations Convey Privileged Information?
Millicent Li
Alberto Mario Ceballos Arroyo
Giordano Rogers
Naomi Saphra
Byron C. Wallace
148
2
0
16 Sep 2025
Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories
Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories
Liviu Nicolae Fircă
Antonio Bărbălău
Dan Oneata
Elena Burceanu
OODVLMLRM
167
0
0
04 Sep 2025
Tracking World States with Language Models: State-Based Evaluation Using Chess
Tracking World States with Language Models: State-Based Evaluation Using Chess
Romain Harang
Jason Naradowsky
Yaswitha Gujju
Yusuke Miyao
52
0
0
27 Aug 2025
What Does it Mean for a Neural Network to Learn a "World Model"?
What Does it Mean for a Neural Network to Learn a "World Model"?
Kenneth Li
F. Viégas
Martin Wattenberg
NAI
116
1
0
29 Jul 2025
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter
Julian Minder
Thomas Hofmann
Tiago Pimentel
184
9
0
11 Jul 2025
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Keyon Vafa
Peter G. Chang
Ashesh Rambachan
S. Mullainathan
601
14
0
09 Jul 2025
Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations
Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations
Ananth Agarwal
Jasper Jian
Christopher D. Manning
Shikhar Murty
235
1
0
20 Jun 2025
Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL
Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL
Mishca de Costa
Muhammad Anwar
Dave Mercier
Mark Randall
Issam Hammad
131
1
0
10 Jun 2025
Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era
Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter EraAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Dan Oneaţă
Desmond Elliott
Stella Frank
183
2
0
04 Jun 2025
Echoes of BERT: Do Modern Language Models Rediscover the Classical NLP Pipeline?
Echoes of BERT: Do Modern Language Models Rediscover the Classical NLP Pipeline?
Michael Li
Nishant Subramani
KELM
224
1
0
02 Jun 2025
Different Speech Translation Models Encode and Translate Speaker Gender Differently
Different Speech Translation Models Encode and Translate Speaker Gender DifferentlyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Dennis Fucci
Marco Gaido
Matteo Negri
L. Bentivogli
Marcely Zanon Boito
Giuseppe Attanasio
217
1
0
02 Jun 2025
Understanding the learned look-ahead behavior of chess neural networks
Understanding the learned look-ahead behavior of chess neural networks
Diogo Cruz
304
0
0
26 May 2025
Large Language Models Do Multi-Label Classification Differently
Large Language Models Do Multi-Label Classification Differently
Marcus Ma
Georgios Chochlakis
Niyantha Maruthu Pandiyan
Jesse Thomason
Zengyi Qin
305
3
0
23 May 2025
Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization
Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization
Vera Neplenbroek
Arianna Bisazza
Raquel Fernández
308
1
0
22 May 2025
Probing Subphonemes in Morphology Models
Probing Subphonemes in Morphology ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Gal Astrach
Yuval Pinter
275
0
0
16 May 2025
Designing and Contextualising Probes for African Languages
Designing and Contextualising Probes for African Languages
Wisdom Aduah
Francois Meyer
339
0
0
15 May 2025
Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations
Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations
Yize Zhao
Christos Thrampoulidis
277
0
0
13 May 2025
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Liyi Zhang
Veniamin Veselovsky
R. Thomas McCoy
Thomas Griffiths
179
1
0
17 Apr 2025
Probing then Editing Response Personality of Large Language Models
Probing then Editing Response Personality of Large Language Models
Tianjie Ju
Zhenyu Shao
Binghai Wang
Yulin Chen
Zhuosheng Zhang
Hao Fei
Yang Deng
Wynne Hsu
Sufeng Duan
Gongshen Liu
KELM
381
3
0
14 Apr 2025
Linguistic Interpretability of Transformer-based Language Models: a systematic review
Linguistic Interpretability of Transformer-based Language Models: a systematic review
Miguel López-Otal
Jorge Gracia
Jordi Bernad
Carlos Bobed
Lucía Pitarch-Ballesteros
Emma Anglés-Herrero
VLM
345
7
0
09 Apr 2025
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou
Zhaocheng Zhu
Xuan Li
Mikhail Galkin
Xiao Feng
Sanmi Koyejo
Jian Tang
Bo Han
LRM
421
11
0
28 Mar 2025
Construction Identification and Disambiguation Using BERT: A Case Study of NPN
Construction Identification and Disambiguation Using BERT: A Case Study of NPN
Wesley Scivetti
Nathan Schneider
290
1
0
24 Mar 2025
Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions
Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions
Guy Bar-Shalom
Fabrizio Frasca
Derek Lim
Yoav Gelberg
Yftah Ziser
Ran El-Yaniv
Gal Chechik
Haggai Maron
402
2
0
18 Mar 2025
Aligned Probing: Relating Toxic Behavior and Model Internals
Aligned Probing: Relating Toxic Behavior and Model Internals
Andreas Waldis
Vagrant Gautam
Anne Lauscher
Dietrich Klakow
Iryna Gurevych
282
2
0
17 Mar 2025
Queueing, Predictions, and LLMs: Challenges and Open Problems
Michael Mitzenmacher
Rana Shahout
AI4TSLRM
207
4
0
10 Mar 2025
Constructions are Revealed in Word Distributions
Constructions are Revealed in Word Distributions
J. Rozner
Leonie Weissweiler
Kyle Mahowald
Cory Shain
338
4
0
08 Mar 2025
Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models
Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models
Tianjie Ju
Yi Hua
Hao Fei
Zhenyu Shao
Yubin Zheng
Haodong Zhao
Yang Deng
Wynne Hsu
Zhuosheng Zhang
Gongshen Liu
398
2
0
03 Mar 2025
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models
L. Arras
Bruno Puri
Patrick Kahardipraja
Sebastian Lapuschkin
Wojciech Samek
292
4
0
21 Feb 2025
Language Models Can Predict Their Own Behavior
Language Models Can Predict Their Own Behavior
Dhananjay Ashok
Jonathan May
AI4TSReLMLRM
414
5
0
18 Feb 2025
We Can't Understand AI Using our Existing Vocabulary
We Can't Understand AI Using our Existing Vocabulary
John Hewitt
Robert Geirhos
Been Kim
307
13
0
11 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Mechanistic Interpretability of Emotion Inference in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
292
5
0
08 Feb 2025
12345678
Next