Linear Adversarial Concept Erasure

28 January 2022

Papers citing "Linear Adversarial Concept Erasure"

9 / 9 papers shown

Title
Collapsed Language Models Promote Fairness Jingxuan Xu Wuyang Chen Linyi Li Yao Zhao Yunchao Wei 34 0 0 06 Oct 2024
Machine Unlearning Fails to Remove Data Poisoning Attacks Martin Pawelczyk Jimmy Z. Di Yiwei Lu Gautam Kamath Ayush Sekhari Seth Neel AAML MU 31 7 0 25 Jun 2024
Uncovering Intermediate Variables in Transformers using Circuit Probing Michael A. Lepori Thomas Serre Ellie Pavlick 40 7 0 07 Nov 2023
A Geometric Notion of Causal Probing Clément Guerner Anej Svete Tianyu Liu Alex Warstadt Ryan Cotterell LLMSV 18 12 0 27 Jul 2023
LEACE: Perfect linear concept erasure in closed form Nora Belrose David Schneider-Joseph Shauli Ravfogel Ryan Cotterell Edward Raff Stella Biderman KELM MU 28 102 0 06 Jun 2023
Debiasing Pre-trained Contextualised Embeddings Masahiro Kaneko Danushka Bollegala 196 121 0 23 Jan 2021
On the Global Optima of Kernelized Adversarial Representation Learning Bashir Sadeghi Runyi Yu Vishnu Naresh Boddeti AAML 51 29 0 16 Oct 2019
A Style-Based Generator Architecture for Generative Adversarial Networks Tero Karras S. Laine Timo Aila 259 10,183 0 12 Dec 2018
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification Xilun Chen Yu Sun Ben Athiwaratkun Claire Cardie Kilian Q. Weinberger 193 302 0 06 Jun 2016