Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.03689
Cited By
Evaluating and Mitigating Discrimination in Language Model Decisions
6 December 2023
Alex Tamkin
Amanda Askell
Liane Lovitt
Esin Durmus
Nicholas Joseph
Shauna Kravec
Karina Nguyen
Jared Kaplan
Deep Ganguli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating and Mitigating Discrimination in Language Model Decisions"
47 / 47 papers shown
Title
Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers
Jared Moore
Declan Grabb
William Agnew
Kevin Klyman
Stevie Chancellor
Desmond C. Ong
Nick Haber
AI4MH
37
0
0
25 Apr 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen
Carlos Guestrin
31
0
0
09 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
Dang Nguyen
Chenhao Tan
27
0
0
07 Apr 2025
Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education
Hayate Iso
Pouya Pezeshkpour
Nikita Bhutani
Estevam R. Hruschka
56
0
0
24 Mar 2025
Rethinking Prompt-based Debiasing in Large Language Models
Xinyi Yang
Runzhe Zhan
Derek F. Wong
Shu Yang
Junchao Wu
Lidia S. Chao
ALM
52
1
0
12 Mar 2025
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs
Zara Siddique
Irtaza Khalid
Liam D. Turner
Luis Espinosa-Anke
LLMSV
56
0
0
07 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
66
0
0
03 Mar 2025
Can LLMs Explain Themselves Counterfactually?
Zahra Dehghanighobadi
Asja Fischer
Muhammad Bilal Zafar
LRM
35
0
0
25 Feb 2025
The Impact of Inference Acceleration on Bias of LLMs
Elisabeth Kirsten
Ivan Habernal
Vedant Nanda
Muhammad Bilal Zafar
33
0
0
20 Feb 2025
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs
Angelina Wang
Michelle Phan
Daniel E. Ho
Sanmi Koyejo
43
2
0
04 Feb 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang
Zhuokai Zhao
Yibo Jiang
Zhaorun Chen
Chen Zhu
...
Jiayi Liu
Lizhu Zhang
Xiangjun Fan
Hao Ma
Sinong Wang
70
3
0
17 Jan 2025
Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts
Preethi Seshadri
Seraphina Goldfarb-Tarrant
35
0
0
08 Jan 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
42
1
0
08 Jan 2025
OpenAI o1 System Card
OpenAI OpenAI
:
Aaron Jaech
Adam Tauman Kalai
Adam Lerer
...
Yuchen He
Yuchen Zhang
Yunyun Wang
Zheng Shao
Zhuohan Li
ELM
LRM
AI4CE
77
1
0
21 Dec 2024
Improving LLM Group Fairness on Tabular Data via In-Context Learning
Valeriia Cherepanova
Chia-Jung Lee
Nil-Jana Akpinar
Riccardo Fogliato
Martín Bertrán
Michael Kearns
James Zou
LMTD
63
0
0
05 Dec 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
48
13
0
25 Oct 2024
Large Language Models Still Exhibit Bias in Long Text
Wonje Jeung
Dongjae Jeon
Ashkan Yousefpour
Jonghyun Choi
ALM
29
2
0
23 Oct 2024
Collapsed Language Models Promote Fairness
Jingxuan Xu
Wuyang Chen
Linyi Li
Yao Zhao
Yunchao Wei
39
0
0
06 Oct 2024
Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios
Vishal Mirza
Rahul Kulkarni
Aakanksha Jadhav
47
2
0
22 Sep 2024
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration
Xin Guan
Nathaniel Demchak
Saloni Gupta
Ze Wang
Ediz Ertekin Jr.
Adriano Soares Koshiyama
Emre Kazim
Zekun Wu
32
2
0
17 Sep 2024
AGR: Age Group fairness Reward for Bias Mitigation in LLMs
Shuirong Cao
Ruoxi Cheng
Zhiqiang Wang
24
4
0
06 Sep 2024
Acceptable Use Policies for Foundation Models
Kevin Klyman
20
14
0
29 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
26
19
0
31 Jul 2024
Machine Unlearning in Generative AI: A Survey
Zheyuan Liu
Guangyao Dou
Zhaoxuan Tan
Yijun Tian
Meng-Long Jiang
MU
29
13
0
30 Jul 2024
Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios
Camilla Bignotti
C. Camassa
AILaw
ELM
34
1
0
29 Jul 2024
Fairness Definitions in Language Models Explained
Thang Viet Doan
Zhibo Chu
Zichong Wang
Wenbin Zhang
ALM
50
10
0
26 Jul 2024
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
K. Kenthapadi
M. Sameki
Ankur Taly
HILM
ELM
AILaw
28
12
0
10 Jul 2024
Towards Compositionality in Concept Learning
Adam Stein
Aaditya Naik
Yinjun Wu
Mayur Naik
Eric Wong
CoGe
37
2
0
26 Jun 2024
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners
Bowen Jiang
Yangxinyu Xie
Zhuoqun Hao
Xiaomeng Wang
Tanwi Mallick
Weijie J. Su
Camillo J. Taylor
Dan Roth
LRM
37
28
0
16 Jun 2024
Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?
Haozhe An
Christabel Acquaye
Colin Wang
Zongxia Li
Rachel Rudinger
28
12
0
15 Jun 2024
A Taxonomy of Challenges to Curating Fair Datasets
Dora Zhao
M. Scheuerman
Pooja Chitre
Jerone T. A. Andrews
Georgia Panagiotidou
Shawn Walker
Kathleen H. Pine
Alice Xiang
28
2
0
10 Jun 2024
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
Ruoxi Cheng
Haoxuan Ma
Shuirong Cao
Jiaqi Li
Aihua Pei
Zhiqiang Wang
Pengliang Ji
Haoyu Wang
Jiaqi Huo
AI4CE
24
6
0
15 Apr 2024
Laissez-Faire Harms: Algorithmic Biases in Generative Language Models
Evan Shieh
Faye-Marie Vassel
Cassidy R. Sugimoto
T. Monroe-White
24
3
0
11 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
51
30
0
08 Apr 2024
Auditing the Use of Language Models to Guide Hiring Decisions
Johann D. Gaebler
Sharad Goel
Aziz Huq
Prasanna Tambe
MLAU
18
8
0
03 Apr 2024
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Mengnan Du
32
10
0
16 Feb 2024
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains
Yanis Labrak
Adrien Bazoge
Emmanuel Morin
P. Gourraud
Mickael Rouvier
Richard Dufour
91
188
0
15 Feb 2024
Rethinking Machine Unlearning for Large Language Models
Sijia Liu
Yuanshun Yao
Jinghan Jia
Stephen Casper
Nathalie Baracaldo
...
Hang Li
Kush R. Varshney
Mohit Bansal
Sanmi Koyejo
Yang Liu
AILaw
MU
63
79
0
13 Feb 2024
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
Ryan Liu
T. Sumers
Ishita Dasgupta
Thomas L. Griffiths
LLMAG
23
13
0
11 Feb 2024
Measuring Implicit Bias in Explicitly Unbiased Large Language Models
Xuechunzi Bai
Angelina Wang
Ilia Sucholutsky
Thomas L. Griffiths
85
27
0
06 Feb 2024
Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks
A. Jagadish
Julian Coda-Forno
Mirko Thalmann
Eric Schulz
Marcel Binz
16
3
0
02 Feb 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Bing Wang
Rui Zheng
Luyao Chen
Yan Liu
Shihan Dou
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yuanyuan Jiang
ALM
25
92
0
11 Jan 2024
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
35
59
0
20 Aug 2023
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
208
364
0
15 Oct 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
882
0
18 Apr 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
254
374
0
28 Feb 2021
1