Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.12611
Cited By
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
19 October 2023
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model"
7 / 7 papers shown
Title
Gender Encoding Patterns in Pretrained Language Model Representations
Mahdi Zakizadeh
Mohammad Taher Pilehvar
36
0
0
09 Mar 2025
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
49
18
0
02 Jul 2024
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
173
116
0
30 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
180
152
0
28 Apr 2023
Quantifying Context Mixing in Transformers
Hosein Mohebbi
Willem H. Zuidema
Grzegorz Chrupała
A. Alishahi
156
24
0
30 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
205
486
0
01 Nov 2022
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
248
374
0
28 Feb 2021
1