Designing and Interpreting Probes with Control Tasks

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

8 September 2019

John Hewitt

Abigail Z. Jacobs

ArXiv (abs)PDF HTML

Papers citing "Designing and Interpreting Probes with Control Tasks"

50 / 381 papers shown

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

Aryaman Arora

Daniel Jurafsky

Christopher Potts

189

19 Feb 2024

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Stephan Gunnemann

475

14 Feb 2024

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Kaixuan Huang

Mengdi Wang

331

174

07 Feb 2024

Breaking Symmetry When Training Transformers

Chunsheng Zuo

Michael Guerzhoy

112

06 Feb 2024

Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models

Sara Rajaee

Christof Monz

251

03 Feb 2024

Dive into the Chasm: Probing the Gap between In- and Cross-Topic Generalization

229

02 Feb 2024

Document Structure in Long Document Transformers

Ilia Kuznetsov

194

31 Jan 2024

Understanding Probe Behaviors through Variational Bounds of Mutual InformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Kwanghee Choi

Jee-weon Jung

Shinji Watanabe

SSL

375

15 Dec 2023

INSPECT: Intrinsic and Systematic Probing Evaluation for Code TransformersIEEE Transactions on Software Engineering (TSE), 2023

Anjan Karmakar

Romain Robbes

233

08 Dec 2023

Revisiting Topic-Guided Language Models

153

04 Dec 2023

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammarsNeural Information Processing Systems (NeurIPS), 2023

Kaiyue Wen

Yuchen Li

Bing Liu

Andrej Risteski

287

03 Dec 2023

Mitigating Over-smoothing in Transformers via Regularized Nonlocal FunctionalsNeural Information Processing Systems (NeurIPS), 2023

Tam Nguyen

Tan-Minh Nguyen

Richard G. Baraniuk

195

01 Dec 2023

What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations

326

30 Nov 2023

Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

Haoran Zhao

Jake Ryland Williams

206

18 Nov 2023

Uncovering Intermediate Variables in Transformers using Circuit Probing

Michael A. Lepori

Thomas Serre

Ellie Pavlick

399

07 Nov 2023

Emergence of Abstract State Representations in Embodied Sequence ModelingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

189

03 Nov 2023

Counterfactually Probing Language Identity in Multilingual Models

Anirudh Srinivasan

Venkata S Govindarajan

Kyle Mahowald

265

29 Oct 2023

Probing LLMs for Joint Encoding of Linguistic CategoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Giulio Starace

Konstantinos Papakostas

Rochelle Choenni

Apostolos Panagiotopoulos

Matteo Rosati

Alina Leidinger

Ekaterina Shutova

262

28 Oct 2023

How do Language Models Bind Entities in Context?International Conference on Learning Representations (ICLR), 2023

Jiahai Feng

Jacob Steinhardt

322

26 Oct 2023

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

277

25 Oct 2023

Is Probing All You Need? Indicator Tasks as an Alternative to Probing Embedding SpacesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Tal Levy

Omer Goldman

Reut Tsarfaty

238

24 Oct 2023

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Lina Conti

Guillaume Wisniewski

204

24 Oct 2023

Understanding the Inner Workings of Language Models Through Representation DissimilarityConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

226

23 Oct 2023

Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True DistributionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Jaap Jumelet

Willem H. Zuidema

280

23 Oct 2023

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

406

23 Oct 2023

Implications of Annotation Artifacts in Edge Probing Test DatasetsConference on Computational Natural Language Learning (CoNLL), 2023

Sagnik Ray Choudhury

Jushaan Kalra

153

20 Oct 2023

Rethinking the Construction of Effective Metrics for Understanding the Mechanisms of Pretrained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

You Li

Jinhui Yin

Yuming Lin

189

19 Oct 2023

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Omer Goldman

286

18 Oct 2023

Disentangling the Linguistic Competence of Privacy-Preserving BERTBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

Stefan Arnold

Nils Kemmerzell

Annika Schreiner

253

17 Oct 2023

A State-Vector Framework for Dataset EffectsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

E. Sahak

Zining Zhu

Frank Rudzicz

224

17 Oct 2023

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

Samuel Marks

Max Tegmark

HILM

486

360

10 Oct 2023

Assessment of Pre-Trained Models Across Languages and GrammarsInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

Alberto Muñoz-Ortiz

David Vilares

Carlos Gómez-Rodríguez

197

20 Sep 2023

Do PLMs Know and Understand Ontological Knowledge?Annual Meeting of the Association for Computational Linguistics (ACL), 2023

269

12 Sep 2023

Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023

Haiyan Zhao

Hanjie Chen

Fan Yang

Ninghao Liu

500

710

02 Sep 2023

Evaluating Transformer's Ability to Learn Mildly Context-Sensitive LanguagesBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

Shunjie Wang

Shane Steinert-Threlkeld

305

02 Sep 2023

Linearity of Relation Decoding in Transformer Language ModelsInternational Conference on Learning Representations (ICLR), 2023

335

140

17 Aug 2023

Overthinking the Truth: Understanding how Language Models Process False DemonstrationsInternational Conference on Learning Representations (ICLR), 2023

Danny Halawi

Jean-Stanislas Denain

Jacob Steinhardt

315

18 Jul 2023

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location ReasoningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

318

12 Jul 2023

Pluggable Neural Machine Translation Models via Memory-augmented AdaptersInternational Conference on Language Resources and Evaluation (LREC), 2023

Yuzhuang Xu

Shuo Wang

Peng Li

Xuebo Liu

Xiaolong Wang

Weidong Liu

Yang Liu

346

12 Jul 2023

Substance or Style: What Does Your Image Embedding Know?

Charles Herrmann

170

10 Jul 2023

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Bailin Wang

436

302

05 Jul 2023

What Do Self-Supervised Speech Models Know About Words?Transactions of the Association for Computational Linguistics (TACL), 2023

482

30 Jun 2023

Operationalising Representation in Natural Language ProcessingBritish Journal for the Philosophy of Science (BJPS), 2023

J. Harding

351

14 Jun 2023

Morphosyntactic probing of multilingual BERT modelsNatural Language Engineering (NLE), 2023

201

09 Jun 2023

A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models

Ritwik Sinha

Zhao Song

Wanrong Zhu

271

04 Jun 2023

Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures

Jakob Prange

Emmanuele Chersoni

211

30 May 2023

Representation Of Lexical Stylistic Features In Language Models' Embedding Space

Qing Lyu

Marianna Apidianaki

Chris Callison-Burch

241

29 May 2023

Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-MakingInternational Conference on Learning Representations (ICLR), 2023

Aliyah R. Hsu

Yeshwanth Cherapanamjeri

287

27 May 2023

NeuroX Library for Neuron Analysis of Deep NLP ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Fahim Dalvi

Hassan Sajjad

Nadir Durrani

239

26 May 2023

On convex decision regions in deep network representationsNature Communications (Nat. Commun.), 2023

313

26 May 2023