ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.09090
  4. Cited By
Social Bias Probing: Fairness Benchmarking for Language Models

Social Bias Probing: Fairness Benchmarking for Language Models

15 November 2023
Marta Marchiori Manerba
Karolina Stañczak
Riccardo Guidotti
Isabelle Augenstein
ArXivPDFHTML

Papers citing "Social Bias Probing: Fairness Benchmarking for Language Models"

16 / 16 papers shown
Title
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAML
ELM
40
0
0
10 Apr 2025
Splits! A Flexible Dataset for Evaluating a Model's Demographic Social Inference
Splits! A Flexible Dataset for Evaluating a Model's Demographic Social Inference
Eylon Caplan
Tania Chakraborty
Dan Goldwasser
24
0
0
06 Apr 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Dahyun Jung
Seungyoon Lee
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
AAML
ALM
ELM
53
0
0
25 Mar 2025
Rethinking LLM Bias Probing Using Lessons from the Social Sciences
Kirsten N. Morehouse
S. Swaroop
Weiwei Pan
43
0
0
28 Feb 2025
Beneath the Surface: How Large Language Models Reflect Hidden Bias
Beneath the Surface: How Large Language Models Reflect Hidden Bias
Jinhao Pan
Chahat Raj
Ziyu Yao
Ziwei Zhu
41
0
0
27 Feb 2025
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
Nathalie Maria Kirch
Constantin Weisser
Severin Field
Helen Yannakoudakis
Stephen Casper
29
1
0
02 Nov 2024
How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine
  Studies
How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies
Alina Leidinger
Richard Rogers
32
5
0
16 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
45
5
0
11 Jul 2024
BiasDora: Exploring Hidden Biased Associations in Vision-Language Models
BiasDora: Exploring Hidden Biased Associations in Vision-Language Models
Chahat Raj
A. Mukherjee
Aylin Caliskan
Antonios Anastasopoulos
Ziwei Zhu
VLM
32
4
0
02 Jul 2024
Revealing Fine-Grained Values and Opinions in Large Language Models
Revealing Fine-Grained Values and Opinions in Large Language Models
Dustin Wright
Arnav Arora
Nadav Borenstein
Srishti Yadav
Serge J. Belongie
Isabelle Augenstein
25
1
0
27 Jun 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
53
30
0
08 Apr 2024
Building Guardrails for Large Language Models
Building Guardrails for Large Language Models
Yizhen Dong
Ronghui Mu
Gao Jin
Yi Qi
Jinwei Hu
Xingyu Zhao
Jie Meng
Wenjie Ruan
Xiaowei Huang
OffRL
57
23
0
02 Feb 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
52
56
0
11 Jan 2024
Having Beer after Prayer? Measuring Cultural Bias in Large Language
  Models
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous
Michael Joseph Ryan
Alan Ritter
Wei-ping Xu
24
82
0
23 May 2023
"I'm sorry to hear that": Finding New Biases in Language Models with a
  Holistic Descriptor Dataset
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset
Eric Michael Smith
Melissa Hall
Melanie Kambadur
Eleonora Presani
Adina Williams
65
128
0
18 May 2022
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai Elsherief
Caleb Ziems
D. Muchlinski
Vaishnavi Anupindi
Jordyn Seybolt
M. D. Choudhury
Diyi Yang
87
233
0
11 Sep 2021
1