Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2404.03471
Cited By
v1
v2
v3
v4
v5 (latest)
Template-Based Probes Are Imperfect Lenses for Counterfactual Bias Evaluation in LLMs
4 April 2024
Farnaz Kohankhaki
D. B. Emerson
David B. Emerson
Laleh Seyyed-Kalantari
Faiza Khan Khattak
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1★)
Papers citing
"Template-Based Probes Are Imperfect Lenses for Counterfactual Bias Evaluation in LLMs"
35 / 35 papers shown
Large Language Models are Geographically Biased
Rohin Manvi
Samar Khanna
Marshall Burke
David B. Lobell
Stefano Ermon
443
102
0
05 Feb 2024
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting
Masahiro Kaneko
Danushka Bollegala
Naoaki Okazaki
Timothy Baldwin
LRM
313
58
0
28 Jan 2024
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters
Yixin Wan
George Pu
Jiao Sun
Aparna Garimella
Kai-Wei Chang
Nanyun Peng
583
305
0
13 Oct 2023
Bias and Fairness in Large Language Models: A Survey
Computational Linguistics (CL), 2023
Isabel O. Gallegos
Ryan Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
476
1,011
0
02 Sep 2023
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Yang Liu
Yuanshun Yao
Jean-François Ton
Xiaoying Zhang
Ruocheng Guo
Hao Cheng
Yegor Klochkov
Muhammad Faaiz Taufiq
Hanguang Li
ALM
480
520
0
10 Aug 2023
Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?
O. Dige
Jacob-Junqi Tian
David B. Emerson
Faiza Khan Khattak
ALM
172
8
0
19 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
12.3K
16,310
0
18 Jul 2023
Soft-prompt Tuning for Large Language Models to Evaluate Bias
Jacob-Junqi Tian
David B. Emerson
Sevil Zanjani Miyandoab
D. Pandya
Laleh Seyyed-Kalantari
Faiza Khan Khattak
VLM
285
12
0
07 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
1.1K
7,889
0
29 May 2023
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Myra Cheng
Esin Durmus
Dan Jurafsky
327
300
0
29 May 2023
Comparing Biases and the Impact of Multilingual Training across Multiple Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sharon Levy
Neha Ann John
Ling Liu
Yogarshi Vyas
Jie Ma
Yoshinari Fujinuma
Miguel Ballesteros
Vittorio Castelli
Dan Roth
258
42
0
18 May 2023
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Morris Alper
Michael Fiman
Hadar Averbuch-Elor
VLM
LRM
319
18
0
21 Mar 2023
Auditing large language models: a three-layered approach
AI and Ethics (AE), 2023
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
561
293
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
377
201
0
15 Feb 2023
Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour
Fangyu Liu
Julian Martin Eisenschlos
Jeremy R. Cole
Nigel Collier
296
5
0
26 Sep 2022
VIPHY: Probing "Visible" Physical Commonsense Knowledge
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Shikhar Singh
Ehsan Qasemi
Muhao Chen
323
6
0
15 Sep 2022
American == White in Multimodal Language-and-Image AI
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022
Robert Wolfe
Aylin Caliskan
VLM
280
58
0
01 Jul 2022
What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Lovisa Hagström
Richard Johansson
VLM
236
3
0
14 May 2022
Using Natural Sentences for Understanding Biases in Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Sarah Alnegheimish
Alicia Guo
Yi Sun
141
26
0
12 May 2022
Visual Commonsense in Pretrained Unimodal and Multimodal Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Chenyu Zhang
Benjamin Van Durme
Zhuowan Li
Elias Stengel-Eskin
VLM
SSL
262
44
0
04 May 2022
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
1.1K
4,614
0
02 May 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
1.2K
3,811
0
12 Apr 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Neural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
2.7K
16,812
0
28 Jan 2022
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Jack W. Rae
Sebastian Borgeaud
Trevor Cai
Katie Millican
Jordan Hoffmann
...
Jeff Stanway
L. Bennett
Demis Hassabis
Koray Kavukcuoglu
G. Irving
613
1,572
0
08 Dec 2021
The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color
Cory Paik
Stéphane Aroca-Ouellette
Alessandro Roncone
Katharina Kann
217
39
0
15 Oct 2021
Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2021
Tobias Norlund
Lovisa Hagström
Richard Johansson
307
26
0
23 Sep 2021
Quantifying Social Biases in NLP: A Generalization and Empirical Comparison of Extrinsic Fairness Metrics
Transactions of the Association for Computational Linguistics (TACL), 2021
Paula Czarnowska
Yogarshi Vyas
Kashif Shah
250
135
0
28 Jun 2021
Towards Understanding and Mitigating Social Biases in Language Models
Paul Pu Liang
Chiyu Wu
Louis-Philippe Morency
Ruslan Salakhutdinov
434
487
0
24 Jun 2021
Persistent Anti-Muslim Bias in Large Language Models
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021
Abubakar Abid
Maheen Farooqi
James Zou
AILaw
520
678
0
14 Jan 2021
Language Models are Few-Shot Learners
Neural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.4K
56,453
0
28 May 2020
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
640
1,317
0
08 May 2020
CheXclusion: Fairness gaps in deep chest X-ray classifiers
Pacific Symposium on Biocomputing (PSB), 2020
Laleh Seyyed-Kalantari
Guanxiong Liu
Matthew B. A. McDermott
Irene Y. Chen
Marzyeh Ghassemi
OOD
414
361
0
14 Feb 2020
The Woman Worked as a Babysitter: On Biases in Language Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
814
794
0
03 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
6.0K
28,988
0
26 Jul 2019
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
Maria De-Arteaga
Alexey Romanov
Hanna M. Wallach
J. Chayes
C. Borgs
Alexandra Chouldechova
S. Geyik
K. Kenthapadi
Adam Tauman Kalai
635
545
0
27 Jan 2019
1
Page 1 of 1