ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.03368
  4. Cited By
Designing and Interpreting Probes with Control Tasks

Designing and Interpreting Probes with Control Tasks

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
8 September 2019
John Hewitt
Abigail Z. Jacobs
ArXiv (abs)PDFHTML

Papers citing "Designing and Interpreting Probes with Control Tasks"

50 / 381 papers shown
CausalGym: Benchmarking causal interpretability methods on linguistic
  tasks
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Aryaman Arora
Daniel Jurafsky
Christopher Potts
189
33
0
19 Feb 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
475
81
0
14 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank
  Modifications
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
331
174
0
07 Feb 2024
Breaking Symmetry When Training Transformers
Breaking Symmetry When Training Transformers
Chunsheng Zuo
Michael Guerzhoy
112
0
0
06 Feb 2024
Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in
  Multilingual Language Models
Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models
Sara Rajaee
Christof Monz
251
10
0
03 Feb 2024
Dive into the Chasm: Probing the Gap between In- and Cross-Topic
  Generalization
Dive into the Chasm: Probing the Gap between In- and Cross-Topic Generalization
Andreas Waldis
Yufang Hou
Iryna Gurevych
ELM
229
9
0
02 Feb 2024
Document Structure in Long Document Transformers
Document Structure in Long Document Transformers
Jan Buchmann
Max Eichler
Jan-Micha Bodensohn
Ilia Kuznetsov
Iryna Gurevych
194
5
0
31 Jan 2024
Understanding Probe Behaviors through Variational Bounds of Mutual
  Information
Understanding Probe Behaviors through Variational Bounds of Mutual InformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Kwanghee Choi
Jee-weon Jung
Shinji Watanabe
SSL
375
7
0
15 Dec 2023
INSPECT: Intrinsic and Systematic Probing Evaluation for Code
  Transformers
INSPECT: Intrinsic and Systematic Probing Evaluation for Code TransformersIEEE Transactions on Software Engineering (TSE), 2023
Anjan Karmakar
Romain Robbes
233
5
0
08 Dec 2023
Revisiting Topic-Guided Language Models
Revisiting Topic-Guided Language Models
Carolina Zheng
Keyon Vafa
David M. Blei
BDL
153
2
0
04 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with
  bounded Dyck grammars
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammarsNeural Information Processing Systems (NeurIPS), 2023
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
287
27
0
03 Dec 2023
Mitigating Over-smoothing in Transformers via Regularized Nonlocal
  Functionals
Mitigating Over-smoothing in Transformers via Regularized Nonlocal FunctionalsNeural Information Processing Systems (NeurIPS), 2023
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
195
26
0
01 Dec 2023
What Do Llamas Really Think? Revealing Preference Biases in Language
  Model Representations
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations
Raphael Tang
Xinyu Crystina Zhang
Jimmy J. Lin
Ferhan Ture
326
11
0
30 Nov 2023
Bit Cipher -- A Simple yet Powerful Word Representation System that
  Integrates Efficiently with Language Models
Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models
Haoran Zhao
Jake Ryland Williams
206
1
0
18 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
399
11
0
07 Nov 2023
Emergence of Abstract State Representations in Embodied Sequence
  Modeling
Emergence of Abstract State Representations in Embodied Sequence ModelingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Tian Yun
Zilai Zeng
Kunal Handa
Ashish V. Thapliyal
Bo Pang
Ellie Pavlick
Chen Sun
LM&Ro
189
9
0
03 Nov 2023
Counterfactually Probing Language Identity in Multilingual Models
Counterfactually Probing Language Identity in Multilingual Models
Anirudh Srinivasan
Venkata S Govindarajan
Kyle Mahowald
265
1
0
29 Oct 2023
Probing LLMs for Joint Encoding of Linguistic Categories
Probing LLMs for Joint Encoding of Linguistic CategoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Giulio Starace
Konstantinos Papakostas
Rochelle Choenni
Apostolos Panagiotopoulos
Matteo Rosati
Alina Leidinger
Ekaterina Shutova
262
13
0
28 Oct 2023
How do Language Models Bind Entities in Context?
How do Language Models Bind Entities in Context?International Conference on Learning Representations (ICLR), 2023
Jiahai Feng
Jacob Steinhardt
322
64
0
26 Oct 2023
Subspace Chronicles: How Linguistic Information Emerges, Shifts and
  Interacts during Language Model Training
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Max Müller-Eberstein
Rob van der Goot
Barbara Plank
Ivan Titov
277
15
0
25 Oct 2023
Is Probing All You Need? Indicator Tasks as an Alternative to Probing
  Embedding Spaces
Is Probing All You Need? Indicator Tasks as an Alternative to Probing Embedding SpacesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Tal Levy
Omer Goldman
Reut Tsarfaty
238
6
0
24 Oct 2023
Using Artificial French Data to Understand the Emergence of Gender Bias
  in Transformer Language Models
Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lina Conti
Guillaume Wisniewski
204
3
0
24 Oct 2023
Understanding the Inner Workings of Language Models Through
  Representation Dissimilarity
Understanding the Inner Workings of Language Models Through Representation DissimilarityConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Davis Brown
Charles Godfrey
Nicholas Konz
Jonathan Tu
Henry Kvinge
226
13
0
23 Oct 2023
Transparency at the Source: Evaluating and Interpreting Language Models
  With Access to the True Distribution
Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True DistributionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jaap Jumelet
Willem H. Zuidema
280
9
0
23 Oct 2023
Towards a Mechanistic Interpretation of Multi-Step Reasoning
  Capabilities of Language Models
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Buse Giledereli
Jiaoda Li
Yu Fei
Alessandro Stolfo
Wangchunshu Zhou
Guangtao Zeng
Antoine Bosselut
Mrinmaya Sachan
LRM
406
60
0
23 Oct 2023
Implications of Annotation Artifacts in Edge Probing Test Datasets
Implications of Annotation Artifacts in Edge Probing Test DatasetsConference on Computational Natural Language Learning (CoNLL), 2023
Sagnik Ray Choudhury
Jushaan Kalra
153
1
0
20 Oct 2023
Rethinking the Construction of Effective Metrics for Understanding the
  Mechanisms of Pretrained Language Models
Rethinking the Construction of Effective Metrics for Understanding the Mechanisms of Pretrained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
You Li
Jinhui Yin
Yuming Lin
189
0
0
19 Oct 2023
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in
  the Hidden States of Over-Confident Large Language Models
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Aviv Slobodkin
Omer Goldman
Avi Caciularu
Ido Dagan
Haiqin Yang
HILMLRM
286
46
0
18 Oct 2023
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Disentangling the Linguistic Competence of Privacy-Preserving BERTBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
253
0
0
17 Oct 2023
A State-Vector Framework for Dataset Effects
A State-Vector Framework for Dataset EffectsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
E. Sahak
Zining Zhu
Frank Rudzicz
224
1
0
17 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model
  Representations of True/False Datasets
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
486
360
0
10 Oct 2023
Assessment of Pre-Trained Models Across Languages and Grammars
Assessment of Pre-Trained Models Across Languages and GrammarsInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Alberto Muñoz-Ortiz
David Vilares
Carlos Gómez-Rodríguez
197
4
0
20 Sep 2023
Do PLMs Know and Understand Ontological Knowledge?
Do PLMs Know and Understand Ontological Knowledge?Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Weiqi Wu
Chengyue Jiang
Yong Jiang
Pengjun Xie
Kewei Tu
269
34
0
12 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
500
710
0
02 Sep 2023
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive
  Languages
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive LanguagesBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Shunjie Wang
Shane Steinert-Threlkeld
305
4
0
02 Sep 2023
Linearity of Relation Decoding in Transformer Language Models
Linearity of Relation Decoding in Transformer Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
335
140
0
17 Aug 2023
Overthinking the Truth: Understanding how Language Models Process False
  Demonstrations
Overthinking the Truth: Understanding how Language Models Process False DemonstrationsInternational Conference on Learning Representations (ICLR), 2023
Danny Halawi
Jean-Stanislas Denain
Jacob Steinhardt
315
72
0
18 Jul 2023
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times
  and Location Reasoning
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location ReasoningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Gengyuan Zhang
Yurui Zhang
Kerui Zhang
Volker Tresp
LRM
318
28
0
12 Jul 2023
Pluggable Neural Machine Translation Models via Memory-augmented
  Adapters
Pluggable Neural Machine Translation Models via Memory-augmented AdaptersInternational Conference on Language Resources and Evaluation (LREC), 2023
Yuzhuang Xu
Shuo Wang
Peng Li
Xuebo Liu
Xiaolong Wang
Weidong Liu
Yang Liu
346
1
0
12 Jul 2023
Substance or Style: What Does Your Image Embedding Know?
Substance or Style: What Does Your Image Embedding Know?
Cyrus Rashtchian
Charles Herrmann
Chun-Sung Ferng
Ayan Chakrabarti
Dilip Krishnan
Deqing Sun
Da-Cheng Juan
Andrew Tomkins
170
7
0
10 Jul 2023
Reasoning or Reciting? Exploring the Capabilities and Limitations of
  Language Models Through Counterfactual Tasks
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Zhaofeng Wu
Linlu Qiu
Alexis Ross
Ekin Akyürek
Boyuan Chen
Bailin Wang
Najoung Kim
Jacob Andreas
Yoon Kim
LRMReLM
436
302
0
05 Jul 2023
What Do Self-Supervised Speech Models Know About Words?
What Do Self-Supervised Speech Models Know About Words?Transactions of the Association for Computational Linguistics (TACL), 2023
Ankita Pasad
C. Chien
Shane Settle
Karen Livescu
SSL
482
56
0
30 Jun 2023
Operationalising Representation in Natural Language Processing
Operationalising Representation in Natural Language ProcessingBritish Journal for the Philosophy of Science (BJPS), 2023
J. Harding
351
17
0
14 Jun 2023
Morphosyntactic probing of multilingual BERT models
Morphosyntactic probing of multilingual BERT modelsNatural Language Engineering (NLE), 2023
Judit Ács
Endre Hamerlik
Roy Schwartz
Noah A. Smith
András Kornai
201
18
0
09 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between
  Creativity and Reality in Large Language Models
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Wanrong Zhu
271
29
0
04 Jun 2023
Empirical Sufficiency Lower Bounds for Language Modeling with
  Locally-Bootstrapped Semantic Structures
Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures
Jakob Prange
Emmanuele Chersoni
211
0
0
30 May 2023
Representation Of Lexical Stylistic Features In Language Models'
  Embedding Space
Representation Of Lexical Stylistic Features In Language Models' Embedding Space
Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
241
11
0
29 May 2023
Diagnosing Transformers: Illuminating Feature Spaces for Clinical
  Decision-Making
Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-MakingInternational Conference on Learning Representations (ICLR), 2023
Aliyah R. Hsu
Yeshwanth Cherapanamjeri
Briton Park
Tristan Naumann
A. Odisho
Bin Yu
MedIm
287
1
0
27 May 2023
NeuroX Library for Neuron Analysis of Deep NLP Models
NeuroX Library for Neuron Analysis of Deep NLP ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
239
14
0
26 May 2023
On convex decision regions in deep network representations
On convex decision regions in deep network representationsNature Communications (Nat. Commun.), 2023
Lenka Tvetková
Thea Brusch
Teresa Scheidt
Fabian Martin Mager
R. Aagaard
Jonathan Foldager
T. S. Alstrøm
Lars Kai Hansen
313
4
0
26 May 2023
Previous
12345678
Next
Page 3 of 8