Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.09435
Cited By
Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation
20 September 2020
Francisco Vargas
Ryan Cotterell
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation"
18 / 18 papers shown
Title
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
62
1
0
23 Apr 2025
Robustly identifying concepts introduced during chat fine-tuning using crosscoders
Julian Minder
Clement Dumas
Caden Juang
Bilal Chugtai
Neel Nanda
29
0
0
03 Apr 2025
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
58
3
0
11 Nov 2024
Gumbel Counterfactual Generation From Language Models
Shauli Ravfogel
Anej Svete
Vésteinn Snæbjarnarson
Ryan Cotterell
LRM
CML
33
0
0
11 Nov 2024
Towards a theory of model distillation
Enric Boix-Adserà
FedML
VLM
44
6
0
14 Mar 2024
This Reads Like That: Deep Learning for Interpretable Natural Language Processing
Claudio Fanconi
Moritz Vandenhirtz
Severin Husmann
Julia E. Vogt
FAtt
14
2
0
25 Oct 2023
Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
Floris Holstege
Bram Wouters
Noud van Giersbergen
C. Diks
34
1
0
18 Oct 2023
Competence-Based Analysis of Language Models
Adam Davies
Jize Jiang
Chengxiang Zhai
ELM
29
4
0
01 Mar 2023
Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology
Valentin Hofmann
J. Pierrehumbert
Hinrich Schütze
25
0
0
14 Dec 2022
Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection
P. Haghighatkhah
Antske Fokkens
Pia Sommerauer
Bettina Speckmann
Kevin Verbeek
32
10
0
08 Dec 2022
Debiasing Methods for Fairer Neural Models in Vision and Language Research: A Survey
Otávio Parraga
Martin D. Móre
C. M. Oliveira
Nathan Gavenski
L. S. Kupssinskü
Adilson Medronha
L. V. Moura
Gabriel S. Simões
Rodrigo C. Barros
45
11
0
10 Nov 2022
Kernelized Concept Erasure
Shauli Ravfogel
Francisco Vargas
Yoav Goldberg
Ryan Cotterell
24
32
0
28 Jan 2022
Linear Adversarial Concept Erasure
Shauli Ravfogel
Michael Twiton
Yoav Goldberg
Ryan Cotterell
KELM
81
57
0
28 Jan 2022
A Word on Machine Ethics: A Response to Jiang et al. (2021)
Zeerak Talat
Hagen Blix
Josef Valvoda
M. I. Ganesh
Ryan Cotterell
Adina Williams
SyDa
FaML
96
38
0
07 Nov 2021
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias
Ting-Rui Chiang
32
3
0
11 Oct 2021
Assessing the Reliability of Word Embedding Gender Bias Measures
Yupei Du
Qixiang Fang
D. Nguyen
46
21
0
10 Sep 2021
The Low-Dimensional Linear Geometry of Contextualized Word Representations
Evan Hernandez
Jacob Andreas
MILM
28
40
0
15 May 2021
WordBias: An Interactive Visual Tool for Discovering Intersectional Biases Encoded in Word Embeddings
Bhavya Ghai
Md. Naimul Hoque
Klaus Mueller
29
26
0
05 Mar 2021
1