Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2201.12191
Cited By
v1
v2
v3
v4
v5 (latest)
Kernelized Concept Erasure
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
28 January 2022
Shauli Ravfogel
Francisco Vargas
Yoav Goldberg
Robert Bamler
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Kernelized Concept Erasure"
20 / 20 papers shown
Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs
Xin Gao
Ruiyi Zhang
Daniel Du
Saurabh Mahindre
Sai Ashish Somayajula
Pengtao Xie
KELM
MU
185
1
0
26 Sep 2025
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
D. Zhang
Wendong Li
Kani Song
Jiaye Lu
Gang Li
Liuchun Yang
Sheng Li
KELM
273
3
0
23 Sep 2025
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
Helena Casademunt
Caden Juang
Adam Karvonen
Samuel Marks
Senthooran Rajamanoharan
Neel Nanda
OODD
LLMSV
514
16
0
22 Jul 2025
Nonlinear Concept Erasure: a Density Matching Approach
Antoine Saillenfest
Pirmin Lemberger
267
0
0
16 Jul 2025
Improving Causal Interventions in Amnesic Probing with Mean Projection or LEACE
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Alicja Dobrzeniecka
Antske Fokkens
Pia Sommerauer
163
1
0
13 Jun 2025
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Juil Sock
Francesco Pinto
622
5
0
30 Oct 2024
Machine Unlearning Fails to Remove Data Poisoning Attacks
Martin Pawelczyk
Jimmy Z. Di
Yiwei Lu
Gautam Kamath
Ayush Sekhari
Seth Neel
AAML
MU
627
33
0
25 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini
Somnath Basu Roy Chowdhury
Snigdha Chaturvedi
675
23
0
17 Jun 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
828
297
0
28 Mar 2024
The Ethics of Automating Legal Actors
Transactions of the Association for Computational Linguistics (TACL), 2023
Josef Valvoda
Alec Thompson
Robert Bamler
Simone Teufel
AILaw
ELM
271
3
0
01 Dec 2023
Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
International Conference on Learning Representations (ICLR), 2023
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
VLM
286
8
0
13 Nov 2023
Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
International Conference on Machine Learning (ICML), 2023
Floris Holstege
Bram Wouters
Noud van Giersbergen
C. Diks
262
3
0
18 Oct 2023
LEACE: Perfect linear concept erasure in closed form
Neural Information Processing Systems (NeurIPS), 2023
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Robert Bamler
Edward Raff
Stella Biderman
KELM
MU
950
193
0
06 Jun 2023
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
466
27
0
17 May 2023
Emergent and Predictable Memorization in Large Language Models
Neural Information Processing Systems (NeurIPS), 2023
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
370
181
0
21 Apr 2023
Competence-Based Analysis of Language Models
Adam Davies
Jize Jiang
Chengxiang Zhai
ELM
417
8
0
01 Mar 2023
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022
Peter Henderson
E. Mitchell
Christopher D. Manning
Dan Jurafsky
Chelsea Finn
268
68
0
27 Nov 2022
Probing Classifiers are Unreliable for Concept Removal and Detection
Neural Information Processing Systems (NeurIPS), 2022
Abhinav Kumar
Chenhao Tan
Amit Sharma
AAML
393
33
0
08 Jul 2022
Naturalistic Causal Probing for Morpho-Syntax
Transactions of the Association for Computational Linguistics (TACL), 2022
Afra Amini
Tiago Pimentel
Clara Meister
Robert Bamler
MILM
364
26
0
14 May 2022
Probing for the Usage of Grammatical Number
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Karim Lasri
Tiago Pimentel
Alessandro Lenci
Thierry Poibeau
Robert Bamler
373
70
0
19 Apr 2022
1
Page 1 of 1