ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.08079
  4. Cited By
Under the Hood: Using Diagnostic Classifiers to Investigate and Improve
  how Language Models Track Agreement Information
v1v2v3 (latest)

Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

24 August 2018
Mario Giulianelli
J. Harding
Florian Mohnert
Dieuwke Hupkes
Willem H. Zuidema
ArXiv (abs)PDFHTML

Papers citing "Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information"

50 / 121 papers shown
Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
Xu Wang
Yan Hu
Benyou Wang
Difan Zou
LLMSV
247
2
0
04 Oct 2025
Line of Sight: On Linear Representations in VLLMs
Line of Sight: On Linear Representations in VLLMs
Achyuta Rajaram
Sarah Schwettmann
Jacob Andreas
Arthur Conmy
VLM
380
5
0
05 Jun 2025
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Liyi Zhang
Veniamin Veselovsky
R. Thomas McCoy
Thomas Griffiths
235
1
0
17 Apr 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Mechanistic Interpretability of Emotion Inference in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
444
12
0
08 Feb 2025
StructFormer: Document Structure-based Masked Attention and its Impact
  on Language Model Pre-Training
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training
Kaustubh Ponkshe
Venkatapathy Subramanian
Natwar Modani
Ganesh Ramakrishnan
287
0
0
25 Nov 2024
Gumbel Counterfactual Generation From Language Models
Gumbel Counterfactual Generation From Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Shauli Ravfogel
Anej Svete
Vésteinn Snæbjarnarson
Robert Bamler
LRMCML
693
1
0
11 Nov 2024
Layer by Layer: Uncovering Where Multi-Task Learning Happens in
  Instruction-Tuned Large Language Models
Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zheng Zhao
Yftah Ziser
Shay B. Cohen
262
8
0
25 Oct 2024
Can Language Models Induce Grammatical Knowledge from Indirect Evidence?
Can Language Models Induce Grammatical Knowledge from Indirect Evidence?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Miyu Oba
Yohei Oseki
Akiyo Fukatsu
Akari Haga
Hiroki Ouchi
Taro Watanabe
Saku Sugawara
282
2
0
08 Oct 2024
Mechanistic?
Mechanistic?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024
Naomi Saphra
Sarah Wiegreffe
AI4CE
324
40
0
07 Oct 2024
How Language Models Prioritize Contextual Grammatical Cues?
How Language Models Prioritize Contextual Grammatical Cues?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024
Hamidreza Amirzadeh
Afra Alishahi
Hosein Mohebbi
237
0
0
04 Oct 2024
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language
  Models
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language ModelsInternational Conference on Computational Linguistics (COLING), 2024
Xinyu Zhou
Delong Chen
Samuel Cahyawijaya
Xufeng Duan
Zhenguang G. Cai
278
1
0
19 Sep 2024
Recurrent Neural Networks Learn to Store and Generate Sequences using
  Non-Linear Representations
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear RepresentationsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024
Róbert Csordás
Christopher Potts
Christopher D. Manning
Atticus Geiger
GAN
248
40
0
20 Aug 2024
The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis
The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation AnalysisComputational Linguistics (CL), 2024
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
595
34
0
02 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
391
31
0
27 Jul 2024
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for
  Interpreting Neural Networks
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Aaron Mueller
CML
294
20
0
05 Jul 2024
Does ChatGPT Have a Mind?
Does ChatGPT Have a Mind?
Simon Goldstein
B. Levinstein
AI4MHLRM
330
16
0
27 Jun 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
503
14
0
24 Jun 2024
Perception of Phonological Assimilation by Neural Speech Recognition
  Models
Perception of Phonological Assimilation by Neural Speech Recognition Models
Charlotte Pouw
Marianne de Heer Kloots
Afra Alishahi
Willem H. Zuidema
279
5
0
21 Jun 2024
What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
Liyi Zhang
Michael Y. Li
Thomas Griffiths
Theodore R. Sumers
Jian-Qiao Zhu
Thomas L. Griffiths
277
9
0
06 Jun 2024
Probing the Category of Verbal Aspect in Transformer Language Models
Probing the Category of Verbal Aspect in Transformer Language Models
Anisia Katinskaia
R. Yangarber
354
7
0
04 Jun 2024
A Philosophical Introduction to Language Models - Part II: The Way
  Forward
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
328
25
0
06 May 2024
From Form(s) to Meaning: Probing the Semantic Depths of Language Models
  Using Multisense Consistency
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Xenia Ohmer
Elia Bruni
Dieuwke Hupkes
AI4CE
334
11
0
18 Apr 2024
More than Correlation: Do Large Language Models Learn Causal
  Representations of Space?
More than Correlation: Do Large Language Models Learn Causal Representations of Space?
Yida Chen
Yixian Gan
Sijia Li
Li Yao
Xiaohan Zhao
LRM
230
8
0
26 Dec 2023
Deep de Finetti: Recovering Topic Distributions from Large Language
  Models
Deep de Finetti: Recovering Topic Distributions from Large Language Models
Liyi Zhang
R. Thomas McCoy
T. Sumers
Jian-Qiao Zhu
Thomas Griffiths
BDL
270
8
0
21 Dec 2023
Grammatical information in BERT sentence embeddings as two-dimensional
  arrays
Grammatical information in BERT sentence embeddings as two-dimensional arraysWorkshop on Representation Learning for NLP (RepL4NLP), 2023
Vivi Nastase
Paola Merlo
322
6
0
15 Dec 2023
Codebook Features: Sparse and Discrete Interpretability for Neural
  Networks
Codebook Features: Sparse and Discrete Interpretability for Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
263
41
0
26 Oct 2023
Subspace Chronicles: How Linguistic Information Emerges, Shifts and
  Interacts during Language Model Training
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Max Müller-Eberstein
Rob van der Goot
Barbara Plank
Ivan Titov
344
17
0
25 Oct 2023
Unnatural language processing: How do language models handle
  machine-generated prompts?
Unnatural language processing: How do language models handle machine-generated prompts?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Corentin Kervadec
Francesca Franzon
Marco Baroni
311
8
0
24 Oct 2023
Verb Conjugation in Transformers Is Determined by Linear Encodings of
  Subject Number
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject NumberConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sophie Hao
Tal Linzen
203
10
0
23 Oct 2023
Investigating semantic subspaces of Transformer sentence embeddings
  through linear structural probing
Investigating semantic subspaces of Transformer sentence embeddings through linear structural probingBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Dmitry Nikolaev
Sebastian Padó
247
11
0
18 Oct 2023
Emergent Linear Representations in World Models of Self-Supervised
  Sequence Models
Emergent Linear Representations in World Models of Self-Supervised Sequence ModelsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Neel Nanda
Andrew Lee
Martin Wattenberg
FAttMILM
375
295
0
02 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
  Tool for BLIP
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
309
52
0
27 Aug 2023
Operationalising Representation in Natural Language Processing
Operationalising Representation in Natural Language ProcessingBritish Journal for the Philosophy of Science (BJPS), 2023
J. Harding
437
21
0
14 Jun 2023
How does GPT-2 compute greater-than?: Interpreting mathematical
  abilities in a pre-trained language model
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language modelNeural Information Processing Systems (NeurIPS), 2023
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
1.1K
199
0
30 Apr 2023
Interventional Probing in High Dimensions: An NLI Case Study
Interventional Probing in High Dimensions: An NLI Case StudyFindings (Findings), 2023
Julia Rozanova
Marco Valentino
Lucas C. Cordeiro
André Freitas
158
9
0
20 Apr 2023
Exposing the Functionalities of Neurons for Gated Recurrent Unit Based
  Sequence-to-Sequence Model
Exposing the Functionalities of Neurons for Gated Recurrent Unit Based Sequence-to-Sequence Model
Yi-Ting Lee
Da-Yi Wu
Chih-Chun Yang
Shou-De Lin
MILM
244
1
0
27 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
An Overview on Language Models: Recent Developments and OutlookAPSIPA Transactions on Signal and Information Processing (TASIP), 2023
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
331
54
0
10 Mar 2023
Dissociating language and thought in large language models
Dissociating language and thought in large language models
Kyle Mahowald
Anna A. Ivanova
I. Blank
Nancy Kanwisher
J. Tenenbaum
Evelina Fedorenko
ELMReLM
408
233
0
16 Jan 2023
Reconstruction Probing
Reconstruction ProbingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Najoung Kim
Jatin Khilnani
Alex Warstadt
Abed Qaddoumi
156
3
0
21 Dec 2022
Assessing the Capacity of Transformer to Abstract Syntactic
  Representations: A Contrastive Analysis Based on Long-distance Agreement
Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance AgreementTransactions of the Association for Computational Linguistics (TACL), 2022
Bingzhi Li
Guillaume Wisniewski
Benoît Crabbé
336
16
0
08 Dec 2022
Do LSTMs See Gender? Probing the Ability of LSTMs to Learn Abstract
  Syntactic Rules
Do LSTMs See Gender? Probing the Ability of LSTMs to Learn Abstract Syntactic Rules
Priyanka Sukumaran
Conor J. Houghton
N. Kazanina
165
4
0
31 Oct 2022
Understanding Domain Learning in Language Models Through Subpopulation
  Analysis
Understanding Domain Learning in Language Models Through Subpopulation AnalysisBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022
Zheng Zhao
Yftah Ziser
Shay B. Cohen
243
7
0
22 Oct 2022
Probing with Noise: Unpicking the Warp and Weft of Embeddings
Probing with Noise: Unpicking the Warp and Weft of EmbeddingsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022
Filip Klubicka
John D. Kelleher
233
4
0
21 Oct 2022
Assessing Neural Referential Form Selectors on a Realistic Multilingual
  Dataset
Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset
Guanyi Chen
F. Same
Kees van Deemter
232
0
0
10 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and reviewNature Machine Intelligence (Nat. Mach. Intell.), 2022
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Robert Bamler
Zhijing Jin
698
139
0
06 Oct 2022
Measuring Causal Effects of Data Statistics on Language Model's
  `Factual' Predictions
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Yanai Elazar
Nora Kassner
Haiqin Yang
Amir Feder
Abhilasha Ravichander
Marius Mosbach
Yonatan Belinkov
Hinrich Schütze
Yoav Goldberg
CMLSyDaMILM
289
62
0
28 Jul 2022
Probing via Prompting
Probing via PromptingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Jiaoda Li
Robert Bamler
Mrinmaya Sachan
302
14
0
04 Jul 2022
Assessing the Limits of the Distributional Hypothesis in Semantic
  Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences
Assessing the Limits of the Distributional Hypothesis in Semantic Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences
Mark Anderson
Jose Camacho-Collados
205
1
0
16 May 2022
Naturalistic Causal Probing for Morpho-Syntax
Naturalistic Causal Probing for Morpho-SyntaxTransactions of the Association for Computational Linguistics (TACL), 2022
Afra Amini
Tiago Pimentel
Clara Meister
Robert Bamler
MILM
359
26
0
14 May 2022
When Does Syntax Mediate Neural Language Model Performance? Evidence
  from Dropout Probes
When Does Syntax Mediate Neural Language Model Performance? Evidence from Dropout ProbesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Mycal Tucker
Tiwalayo Eisape
Peng Qian
R. Levy
J. Shah
MILM
174
13
0
20 Apr 2022
123
Next
Page 1 of 3