The Devil is in the Neurons: Interpreting and Mitigating Social Biases
in Pre-trained Language Models

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

14 June 2024

Yan Liu

Pin-Yu Chen

Tsung-Yi Ho

Papers citing "The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models"

8 / 8 papers shown

Title
Gender Bias in Meta-Embeddings Masahiro Kaneko Danushka Bollegala Naoaki Okazaki 24 6 0 19 May 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering Alicia Parrish Angelica Chen Nikita Nangia Vishakh Padmakumar Jason Phang Jana Thompson Phu Mon Htut Sam Bowman 205 364 0 15 Oct 2021
Toward Annotator Group Bias in Crowdsourcing Haochen Liu J. Thekinen Sinem Mollaoglu Da Tang Ji Yang Youlong Cheng Hui Liu Jiliang Tang 31 16 0 08 Oct 2021
Trustworthy AI: A Computational Perspective Haochen Liu Yiqi Wang Wenqi Fan Xiaorui Liu Yaxin Li Shaili Jain Yunhao Liu Anil K. Jain Jiliang Tang FaML 90 193 0 12 Jul 2021
Measuring and Improving Consistency in Pretrained Language Models Yanai Elazar Nora Kassner Shauli Ravfogel Abhilasha Ravichander Eduard H. Hovy Hinrich Schütze Yoav Goldberg HILM 252 273 0 01 Feb 2021
Debiasing Pre-trained Contextualised Embeddings Masahiro Kaneko Danushka Bollegala 205 121 0 23 Jan 2021
The Woman Worked as a Babysitter: On Biases in Language Generation Emily Sheng Kai-Wei Chang Premkumar Natarajan Nanyun Peng 195 607 0 03 Sep 2019
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 283 4,143 0 23 Aug 2019