ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.07667
  4. Cited By
Null It Out: Guarding Protected Attributes by Iterative Nullspace
  Projection
v1v2 (latest)

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
16 April 2020
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
ArXiv (abs)PDFHTML

Papers citing "Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection"

50 / 310 papers shown
An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation
An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation
Daiki Shirafuji
Tatsuhiko Saito
Yasutomo Kimura
MoMeKELM
165
0
0
02 Dec 2025
Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
Dachuan Zhao
Weiyue Li
Zhenda Shen
Yushu Qiu
Bowen Xu
Haoyu Chen
Yongchao Chen
162
3
0
22 Nov 2025
Spectral Identifiability for Interpretable Probe Geometry
Spectral Identifiability for Interpretable Probe Geometry
William Hao-Cheng Huang
167
0
0
20 Nov 2025
HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning
HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning
Qihao Yang
Xuelin Wang
Jiale Chen
Xuelian Dong
Yuxin Hao
Tianyong Hao
220
0
0
19 Nov 2025
Extending Fair Null-Space Projections for Continuous Attributes to Kernel Methods
Extending Fair Null-Space Projections for Continuous Attributes to Kernel Methods
Felix Störck
Fabian Hinder
Barbara Hammer
126
0
0
05 Nov 2025
TriCon-Fair: Triplet Contrastive Learning for Mitigating Social Bias in Pre-trained Language Models
TriCon-Fair: Triplet Contrastive Learning for Mitigating Social Bias in Pre-trained Language Models
Chong Lyu
Lin Li
Shiqing Wu
Jingling Yuan
182
0
0
02 Nov 2025
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Hiba Ahsan
Byron C. Wallace
LLMSV
246
1
0
31 Oct 2025
Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis
Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis
Enze Shi
Pankaj Bhagwat
Zhixian Yang
Linglong Kong
Bei Jiang
161
0
0
27 Oct 2025
The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems
The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems
T. Nguyen
Linhao Luo
Thuy-Trang Vu
Dinh Q. Phung
LLMAG
154
1
0
13 Oct 2025
Language steering in latent space to mitigate unintended code-switching
Language steering in latent space to mitigate unintended code-switching
Andrey Goncharov
Nikolai Kondusov
Alexey Zaytsev
LLMSV
254
0
0
11 Oct 2025
Counterfactually Fair Conformal Prediction
Counterfactually Fair Conformal Prediction
Ozgur Guldogan
Neeraj Sarna
Yuanyuan Li
Michael Berger
193
1
0
09 Oct 2025
Mitigating Biases in Language Models via Bias Unlearning
Mitigating Biases in Language Models via Bias Unlearning
Dianqing Liu
Yi Liu
Guoqing Jin
Zhendong Mao
MU
250
3
0
30 Sep 2025
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
Xin Xu
Xunzhi He
Churan Zhi
Ruizhe Chen
Julian McAuley
Zexue He
142
2
0
30 Sep 2025
Causally-Enhanced Reinforcement Policy Optimization
Causally-Enhanced Reinforcement Policy Optimization
Xiangqi Wang
Yue Huang
Yujun Zhou
Xiaonan Luo
Kehan Guo
Xiangliang Zhang
OffRLLRM
236
1
0
27 Sep 2025
Diagnosing the Performance Trade-off in Moral Alignment: A Case Study on Gender Stereotypes
Diagnosing the Performance Trade-off in Moral Alignment: A Case Study on Gender Stereotypes
Guangliang Liu
Bocheng Chen
Xitong Zhang
Xitong Zhang
K. Johnson
223
0
0
25 Sep 2025
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
D. Zhang
Wendong Li
Kani Song
Jiaye Lu
Gang Li
Liuchun Yang
Sheng Li
KELM
260
3
0
23 Sep 2025
Fair-GPTQ: Bias-Aware Quantization for Large Language Models
Fair-GPTQ: Bias-Aware Quantization for Large Language Models
Irina Proskurina
Guillaume Metzler
Julien Velcin
MQ
257
0
0
18 Sep 2025
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
Vincent Siu
Nathan W. Henry
Nicholas Crispino
Yang Liu
Dawn Song
Chenguang Wang
LLMSV
349
0
0
16 Sep 2025
SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
Vincent Siu
Nicholas Crispino
David Park
Nathan W. Henry
Yu Yang
Yang Liu
Kurt Thomas
Chenguang Wang
LLMSV
408
1
0
16 Sep 2025
Rethinking Sparse Autoencoders: Select-and-Project for Fairness and Control from Encoder Features Alone
Rethinking Sparse Autoencoders: Select-and-Project for Fairness and Control from Encoder Features Alone
Antonio Bărbălău
Cristian Daniel Păduraru
Teodor Poncu
Alexandru Tifrea
Elena Burceanu
239
1
0
13 Sep 2025
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Harethah Shairah
Hasan Hammoud
G. Turkiyyah
Bernard Ghanem
LLMSV
181
4
0
28 Aug 2025
Caught in the Act: a mechanistic approach to detecting deception
Caught in the Act: a mechanistic approach to detecting deception
Gerard Boxo
Ryan Socha
Daniel Yoo
Shivam Raval
154
2
0
27 Aug 2025
CausalSent: Interpretable Sentiment Classification with RieszNet
CausalSent: Interpretable Sentiment Classification with RieszNet
Daniel Frees
Martin Pollack
CML
209
0
0
25 Aug 2025
Debiasing Multilingual LLMs in Cross-lingual Latent Space
Debiasing Multilingual LLMs in Cross-lingual Latent Space
Qiwei Peng
Guimin Hu
Yekun Chai
Anders Søgaard
176
1
0
25 Aug 2025
VideoEraser: Concept Erasure in Text-to-Video Diffusion Models
VideoEraser: Concept Erasure in Text-to-Video Diffusion Models
Naen Xu
Jinghuai Zhang
Changjiang Li
Zhi Chen
Chunyi Zhou
Qingming Li
Xuhong Zhang
Shouling Ji
DiffMVGen
357
4
0
21 Aug 2025
Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
Ruicheng Xian
Yuxuan Wan
Han Zhao
FaML
213
0
0
15 Aug 2025
NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection
NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection
Jiazhen Yan
Fan Wang
Weiwei Jiang
Wandi Qiao
Zhangjie Fu
DiffM
297
7
0
02 Aug 2025
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
Helena Casademunt
Caden Juang
Adam Karvonen
Samuel Marks
Senthooran Rajamanoharan
Neel Nanda
OODDLLMSV
513
16
0
22 Jul 2025
Distributional Machine Unlearning via Selective Data Removal
Distributional Machine Unlearning via Selective Data Removal
Youssef Allouah
R. Guerraoui
Sanmi Koyejo
MU
307
0
0
20 Jul 2025
Nonlinear Concept Erasure: a Density Matching Approach
Nonlinear Concept Erasure: a Density Matching Approach
Antoine Saillenfest
Pirmin Lemberger
260
0
0
16 Jul 2025
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter
Julian Minder
Thomas Hofmann
Tiago Pimentel
240
12
0
11 Jul 2025
Reason to Rote: Rethinking Memorization in Reasoning
Reason to Rote: Rethinking Memorization in Reasoning
Yupei Du
Philipp Mondorf
Silvia Casola
Yuekun Yao
Robert Litschko
Barbara Plank
252
3
0
07 Jul 2025
The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept Erasure
The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept Erasure
Yu Fan
Yoan Hermstrüwer
Shauli Ravfogel
Mrinmaya Sachan
Elliott Ash
Alexander Miserlis Hoyle
372
0
0
01 Jul 2025
Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs
Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Reduan Achtibat
Patrick Kahardipraja
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
301
2
0
16 Jun 2025
Improving Causal Interventions in Amnesic Probing with Mean Projection or LEACE
Improving Causal Interventions in Amnesic Probing with Mean Projection or LEACEAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Alicja Dobrzeniecka
Antske Fokkens
Pia Sommerauer
157
1
0
13 Jun 2025
Convergent Linear Representations of Emergent Misalignment
Convergent Linear Representations of Emergent Misalignment
Anna Soligo
Edward Turner
Senthooran Rajamanoharan
Neel Nanda
MoMe
288
29
0
13 Jun 2025
Robustly Improving LLM Fairness in Realistic Settings via Interpretability
Robustly Improving LLM Fairness in Realistic Settings via Interpretability
Adam Karvonen
Samuel Marks
387
11
0
12 Jun 2025
Preserving Task-Relevant Information Under Linear Concept Removal
Preserving Task-Relevant Information Under Linear Concept Removal
Floris Holstege
Shauli Ravfogel
Bram Wouters
KELM
420
0
0
12 Jun 2025
Iterative Multilingual Spectral Attribute Erasure
Iterative Multilingual Spectral Attribute Erasure
Shun Shao
Yftah Ziser
Zheng Zhao
Yifu Qiu
Shay B. Cohen
Anna Korhonen
267
0
0
12 Jun 2025
MANBench: Is Your Multimodal Model Smarter than Human?
MANBench: Is Your Multimodal Model Smarter than Human?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Han Zhou
Qitong Xu
Yiheng Dong
Xin Yang
272
1
0
04 Jun 2025
COSMIC: Generalized Refusal Direction Identification in LLM Activations
COSMIC: Generalized Refusal Direction Identification in LLM ActivationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Vincent Siu
Nicholas Crispino
Zihao Yu
Sam Pan
Yu Yang
Yang Liu
Dawn Song
Chenguang Wang
LLMSV
469
8
0
30 May 2025
Precise In-Parameter Concept Erasure in Large Language Models
Precise In-Parameter Concept Erasure in Large Language Models
Yoav Gur-Arieh
Clara Suslik
Yihuai Hong
Fazl Barez
Mor Geva
KELMMU
452
7
0
28 May 2025
Paying Alignment Tax with Contrastive Learning
Paying Alignment Tax with Contrastive Learning
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Antonio del Rio Chanona
354
2
0
25 May 2025
Advertising in AI systems: Society must be vigilant
Advertising in AI systems: Society must be vigilant
Menghua Wu
Yujia Bao
377
1
0
23 May 2025
Sparse Activation Editing for Reliable Instruction Following in Narratives
Sparse Activation Editing for Reliable Instruction Following in Narratives
Runcong Zhao
Chengyu Cao
Qinglin Zhu
Xiucheng Lv
Shun Shao
Lin Gui
Ruifeng Xu
Yulan He
235
3
0
22 May 2025
Do Language Models Use Their Depth Efficiently?
Do Language Models Use Their Depth Efficiently?
Róbert Csordás
Christopher D. Manning
Christopher Potts
666
32
0
20 May 2025
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Chaomeng Chen
Zitong Yu
Jin Song Dong
Sen Su
Linlin Shen
Shutao Xia
Simeng Qin
FedMLVLM
940
0
0
03 May 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Yonatan Bitton
Brian Gordon
Michal Sokolik
Nitzan Bitton-Guetta
Almog Gueta
Royi Rassin
Itay Laish
Dani Lischinski
EGVMVGen
508
2
0
24 Apr 2025
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation SteeringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yongbin Li
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
406
26
0
20 Apr 2025
On Linear Representations and Pretraining Data Frequency in Language Models
On Linear Representations and Pretraining Data Frequency in Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Jack Merullo
Noah A. Smith
Sarah Wiegreffe
Yanai Elazar
554
14
0
16 Apr 2025
1234567
Next
Page 1 of 7