Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

17 October 2023

Papers citing "Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models"

6 / 6 papers shown

Title
'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs Ishika Joshi Ishita Gupta Adrita Dey Tapan Parikh AI4CE 30 2 0 20 Sep 2024
Bias in Text Embedding Models Vasyl Rakivnenko Nestor Maslej Jessica Cervi Volodymyr Zhukov 29 0 0 17 Jun 2024
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 328 4,077 0 24 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 313 11,953 0 04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 213 1,657 0 15 Oct 2021
The Woman Worked as a Babysitter: On Biases in Language Generation Emily Sheng Kai-Wei Chang Premkumar Natarajan Nanyun Peng 217 616 0 03 Sep 2019