v1v2v3v4v5 (latest)

Template-Based Probes Are Imperfect Lenses for Counterfactual Bias Evaluation in LLMs

4 April 2024

Farnaz Kohankhaki

D. B. Emerson

David B. Emerson

Laleh Seyyed-Kalantari

Faiza Khan Khattak

ArXiv (abs)PDF HTML Github (1★)

Papers citing "Template-Based Probes Are Imperfect Lenses for Counterfactual Bias Evaluation in LLMs"

35 / 35 papers shown

Large Language Models are Geographically Biased

443

102

05 Feb 2024

Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting

313

28 Jan 2024

"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters

583

305

13 Oct 2023

Bias and Fairness in Large Language Models: A SurveyComputational Linguistics (CL), 2023

Isabel O. Gallegos

Ryan Rossi

Joe Barrow

Md Mehrab Tanjim

Sungchul Kim

476

1,011

02 Sep 2023

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Muhammad Faaiz Taufiq

Hanguang Li

ALM

480

520

10 Aug 2023

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

Jacob-Junqi Tian

172

19 Jul 2023

Llama 2: Open Foundation and Fine-Tuned Chat Models

Louis Martin

...

Sharan Narang

Sergey Edunov

12.3K

16,310

18 Jul 2023

Soft-prompt Tuning for Large Language Models to Evaluate Bias

Jacob-Junqi Tian

David B. Emerson

Sevil Zanjani Miyandoab

D. Pandya

Laleh Seyyed-Kalantari

Faiza Khan Khattak

VLM

285

07 Jun 2023

Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023

Christopher D. Manning

Chelsea Finn

ALM

1.1K

7,889

29 May 2023

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Myra Cheng

Esin Durmus

Dan Jurafsky

327

300

29 May 2023

Comparing Biases and the Impact of Multilingual Training across Multiple LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

258

18 May 2023

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language UnderstandingComputer Vision and Pattern Recognition (CVPR), 2023

319

21 Mar 2023

Auditing large language models: a three-layered approachAI and Ethics (AE), 2023

561

293

16 Feb 2023

The Capacity for Moral Self-Correction in Large Language Models

Deep Ganguli

...

377

201

15 Feb 2023

Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour

Fangyu Liu

Julian Martin Eisenschlos

Jeremy R. Cole

Nigel Collier

296

26 Sep 2022

VIPHY: Probing "Visible" Physical Commonsense KnowledgeConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Shikhar Singh

Ehsan Qasemi

Muhao Chen

323

15 Sep 2022

American == White in Multimodal Language-and-Image AIAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022

Robert Wolfe

Aylin Caliskan

VLM

280

01 Jul 2022

What do Models Learn From Training on More Than Text? Measuring Visual Commonsense KnowledgeAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Lovisa Hagström

Richard Johansson

VLM

236

14 May 2022

Using Natural Sentences for Understanding Biases in Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Sarah Alnegheimish

Alicia Guo

Yi Sun

141

12 May 2022

Visual Commonsense in Pretrained Unimodal and Multimodal ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Elias Stengel-Eskin

262

04 May 2022

OPT: Open Pre-trained Transformer Language Models

...

Luke Zettlemoyer

1.1K

4,614

02 May 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

...

1.2K

3,811

12 Apr 2022

Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022

2.7K

16,812

28 Jan 2022

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

...

613

1,572

08 Dec 2021

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

Cory Paik

Stéphane Aroca-Ouellette

Alessandro Roncone

Katharina Kann

217

15 Oct 2021

Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2021

Tobias Norlund

Lovisa Hagström

Richard Johansson

307

23 Sep 2021

Quantifying Social Biases in NLP: A Generalization and Empirical Comparison of Extrinsic Fairness MetricsTransactions of the Association for Computational Linguistics (TACL), 2021

Paula Czarnowska

Yogarshi Vyas

Kashif Shah

250

135

28 Jun 2021

Towards Understanding and Mitigating Social Biases in Language Models

Paul Pu Liang

Chiyu Wu

Louis-Philippe Morency

Ruslan Salakhutdinov

434

487

24 Jun 2021

Persistent Anti-Muslim Bias in Large Language ModelsAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021

520

678

14 Jan 2021

Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020

...

2.4K

56,453

28 May 2020

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Tongshuang Wu

640

1,317

08 May 2020

CheXclusion: Fairness gaps in deep chest X-ray classifiersPacific Symposium on Biocomputing (PSB), 2020

Laleh Seyyed-Kalantari

Guanxiong Liu

Matthew B. A. McDermott

Irene Y. Chen

Marzyeh Ghassemi

OOD

414

361

14 Feb 2020

The Woman Worked as a Babysitter: On Biases in Language GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

814

794

03 Sep 2019

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Luke Zettlemoyer

6.0K

28,988

26 Jul 2019

Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

Alexandra Chouldechova

S. Geyik

K. Kenthapadi

Adam Tauman Kalai

635

545

27 Jan 2019