Probing Classifiers are Unreliable for Concept Removal and Detection

8 July 2022

Papers citing "Probing Classifiers are Unreliable for Concept Removal and Detection"

8 / 8 papers shown

Title
A Geometric Notion of Causal Probing Clément Guerner Anej Svete Tianyu Liu Alex Warstadt Ryan Cotterell LLMSV 34 12 0 27 Jul 2023
LEACE: Perfect linear concept erasure in closed form Nora Belrose David Schneider-Joseph Shauli Ravfogel Ryan Cotterell Edward Raff Stella Biderman KELM MU 41 102 0 06 Jun 2023
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information I. Nejadgholi Esma Balkir Kathleen C. Fraser S. Kiritchenko 23 3 0 19 Oct 2022
Linear Adversarial Concept Erasure Shauli Ravfogel Michael Twiton Yoav Goldberg Ryan Cotterell KELM 71 57 0 28 Jan 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 224 404 0 24 Feb 2021
An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa Aditi Raghunathan Pang Wei Koh Percy Liang 144 369 0 09 May 2020
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 294 4,187 0 23 Aug 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 199 882 0 03 May 2018