Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2407.02408
Cited By
v1
v2 (latest)
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
2 July 2024
Song Wang
Peng Wang
Tong Zhou
Yushun Dong
Zhen Tan
Jundong Li
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Github (13★)
Papers citing
"CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models"
50 / 63 papers shown
Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study
Ayan Majumdar
Feihao Chen
Jinghui Li
Xiaozhen Wang
244
1
0
10 Apr 2026
A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models
Robab Aghazadeh-Chakherlou
Qing Guo
Siddartha Khastgir
Peter Popov
Xiaoge Zhang
Xingyu Zhao
208
1
0
01 Nov 2025
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
Xin Xu
Xunzhi He
Churan Zhi
Ruizhe Chen
Julian McAuley
Zexue He
149
2
0
30 Sep 2025
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs
Yinong Oliver Wang
N. Sivakumar
Falaah Arif Khan
Rin Metcalf Susa
Adam Goliñski
Natalie Mackraz
B. Theobald
Luca Zappella
N. Apostoloff
392
1
0
29 May 2025
The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation
Zhenru Zhang
Ning Li
Qi Liu
Rui Li
W. Gao
Qingyang Mao
Zhenya Huang
Baosheng Yu
Dacheng Tao
RALM
350
0
0
11 Apr 2025
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
Kefan Song
Jin Yao
Runnan Jiang
Rohan Chandra
Shangtong Zhang
ALM
405
1
0
10 Mar 2025
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
International Conference on Learning Representations (ICLR), 2024
Zhiting Fan
Ruizhe Chen
Tianxiang Hu
Zuozhu Liu
326
42
0
25 Oct 2024
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Yujun Zhou
Jingdong Yang
Yue Huang
Kehan Guo
Zoe Emory
...
Tian Gao
Werner Geyer
Nuno Moniz
Nitesh Chawla
Xiangliang Zhang
567
13
0
18 Oct 2024
MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models
Social Science Research Network (SSRN), 2025
Vibhor Agarwal
Yiqiao Jin
Mohit Chandra
M. D. Choudhury
Srijan Kumar
Nishanth R. Sastry
HILM
LM&MA
464
24
0
29 Sep 2024
Uncertainty Aware Learning for Language Model Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yikun Wang
Rui Zheng
Liang Ding
Qi Zhang
Dahua Lin
Dacheng Tao
348
11
0
07 Jun 2024
Safety in Graph Machine Learning: Threats and Safeguards
Song Wang
Yushun Dong
Binchi Zhang
Zihan Chen
Xingbo Fu
Yinhan He
Cong Shen
Chuxu Zhang
Nitesh Chawla
Wenlin Yao
410
11
0
17 May 2024
Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu
Sribala Vidyadhari Chinta
Wenbin Zhang
AILaw
296
102
0
31 Mar 2024
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Wenlin Yao
Lu Cheng
Huan Liu
SyDa
478
87
0
21 Feb 2024
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting
Masahiro Kaneko
Danushka Bollegala
Naoaki Okazaki
Timothy Baldwin
LRM
320
58
0
28 Jan 2024
In-context Learning with Retrieved Demonstrations for Language Models: A Survey
an Luo
Xin Xu
Yue Liu
Panupong Pasupat
Mehran Kazemi
RALM
852
83
0
21 Jan 2024
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention
Zhen Tan
Tianlong Chen
Zhenyu Zhang
Huan Liu
254
27
0
22 Dec 2023
Interpreting Pretrained Language Models via Concept Bottlenecks
Zhen Tan
Lu Cheng
Song Wang
Yuan Bo
Wenlin Yao
Huan Liu
LRM
267
41
0
08 Nov 2023
Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Song Wang
Zhen Tan
Ruocheng Guo
Wenlin Yao
NoLa
247
30
0
02 Nov 2023
Knowledge Editing for Large Language Models: A Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Wenlin Yao
KELM
540
235
0
24 Oct 2023
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
519
3,278
0
10 Oct 2023
Bias and Fairness in Large Language Models: A Survey
Computational Linguistics (CL), 2023
Isabel O. Gallegos
Ryan Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
482
1,011
0
02 Sep 2023
Fair Few-shot Learning with Auxiliary Sets
European Conference on Artificial Intelligence (ECAI), 2023
Song Wang
Jing Ma
Lu Cheng
Jundong Li
192
4
0
28 Aug 2023
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
474
110
0
20 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
12.3K
16,448
0
18 Jul 2023
A Survey on Evaluation of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
897
3,210
0
06 Jul 2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Neural Information Processing Systems (NeurIPS), 2023
Wei Ping
Weixin Chen
Hengzhi Pei
Chulin Xie
Mintong Kang
...
Zinan Lin
Yuk-Kit Cheng
Sanmi Koyejo
Basel Alomair
Yue Liu
581
599
0
20 Jun 2023
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models
Yue Huang
Qihui Zhang
Philip S. Y
Lichao Sun
308
74
0
20 Jun 2023
Fairness of ChatGPT
Yunqi Li
Lanjing Zhang
Zelong Li
483
26
0
22 May 2023
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
First Monday (FM), 2023
Emilio Ferrara
SILM
551
360
0
07 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
5.3K
23,506
0
15 Mar 2023
Interpreting Unfairness in Graph Neural Networks via Training Node Attribution
AAAI Conference on Artificial Intelligence (AAAI), 2022
Yushun Dong
Song Wang
Jing Ma
Ninghao Liu
Jundong Li
251
31
0
25 Nov 2022
Scaling Instruction-Finetuned Language Models
Journal of machine learning research (JMLR), 2022
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
1.8K
4,038
0
20 Oct 2022
In conversation with Artificial Intelligence: aligning language models with human values
Philosophy & Technology (PT), 2022
Atoosa Kasirzadeh
Iason Gabriel
448
142
0
01 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
679
717
0
23 Aug 2022
On Structural Explanation of Bias in Graph Neural Networks
Knowledge Discovery and Data Mining (KDD), 2022
Yushun Dong
Song Wang
Yu Wang
Hanyu Wang
Jundong Li
238
38
0
24 Jun 2022
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Eric Michael Smith
Melissa Hall
Melanie Kambadur
Eleonora Presani
Adina Williams
375
191
0
18 May 2022
Fairness in Graph Mining: A Survey
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2022
Yushun Dong
Jing Ma
Song Wang
Chen Chen
Jundong Li
FaML
422
166
0
21 Apr 2022
Red Teaming Language Models with Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
623
976
0
07 Feb 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
669
674
0
15 Oct 2021
Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Shahar Levy
Koren Lazar
Gabriel Stanovsky
421
80
0
08 Sep 2021
RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Soumya Barikeri
Anne Lauscher
Ivan Vulić
Goran Glavaš
317
219
0
07 Jun 2021
Unmasking the Mask -- Evaluating Social Biases in Masked Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2021
Masahiro Kaneko
Danushka Bollegala
316
89
0
15 Apr 2021
Fair Mixup: Fairness via Interpolation
International Conference on Learning Representations (ICLR), 2021
Ching-Yao Chuang
Youssef Mroueh
242
159
0
11 Mar 2021
Towards a Unified Framework for Fair and Stable Graph Representation Learning
Conference on Uncertainty in Artificial Intelligence (UAI), 2021
Chirag Agarwal
Himabindu Lakkaraju
Marinka Zitnik
421
194
0
25 Feb 2021
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation
Conference on Fairness, Accountability and Transparency (FAccT), 2021
Jwala Dhamala
Tony Sun
Varun Kumar
Satyapriya Krishna
Yada Pruksachatkun
Kai-Wei Chang
Rahul Gupta
402
537
0
27 Jan 2021
Persistent Anti-Muslim Bias in Large Language Models
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021
Abubakar Abid
Maheen Farooqi
James Zou
AILaw
523
678
0
14 Jan 2021
Measuring and Reducing Gendered Correlations in Pre-trained Models
Kellie Webster
Xuezhi Wang
Ian Tenney
Alex Beutel
Emily Pitler
Ellie Pavlick
Jilin Chen
Ed Chi
Slav Petrov
FaML
722
303
0
12 Oct 2020
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Nikita Nangia
Clara Vania
Rasika Bhalerao
Samuel R. Bowman
924
902
0
30 Sep 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Findings (Findings), 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
1.3K
1,599
0
24 Sep 2020
Unfairness Discovery and Prevention For Few-Shot Regression
Chengli Zhao
Feng Chen
187
23
0
23 Sep 2020
1
2
Next
Page 1 of 2