Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.07626
Cited By
BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation
14 October 2022
Tianxiang Sun
Junliang He
Xipeng Qiu
Xuanjing Huang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation"
32 / 32 papers shown
Title
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Sanchit Kabra
Akshita Jha
Chandan K. Reddy
LRM
21
0
0
08 Apr 2025
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering
Sher Badshah
Hassan Sajjad
60
1
0
11 Mar 2025
BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression
Daniil Larionov
Steffen Eger
VLM
MQ
74
0
0
04 Mar 2025
Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form Text
Sher Badshah
Hassan Sajjad
ELM
36
9
0
17 Aug 2024
A Comparative Study of Quality Evaluation Methods for Text Summarization
Huyen Nguyen
Haihua Chen
Lavanya Pobbathi
Junhua Ding
ELM
24
5
0
30 Jun 2024
Measuring Retrieval Complexity in Question Answering Systems
Matteo Gabburo
Nicolaas Paul Jedema
Siddhant Garg
Leonardo F. R. Ribeiro
Alessandro Moschitti
21
0
0
05 Jun 2024
Expert-Guided Extinction of Toxic Tokens for Debiased Generation
Xueyao Sun
Kaize Shi
Haoran Tang
Guandong Xu
Qing Li
MU
35
1
0
29 May 2024
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
Yuemei Xu
Ling Hu
Jiayi Zhao
Zihan Qiu
Yuqi Ye
Hanwen Gu
LRM
19
36
0
01 Apr 2024
Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu
Zichong Wang
Wenbin Zhang
AILaw
33
31
0
31 Mar 2024
Measuring Political Bias in Large Language Models: What Is Said and How It Is Said
Yejin Bang
Delong Chen
Nayeon Lee
Pascale Fung
21
25
0
27 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
Ziyang Xu
Keqin Peng
Liang Ding
Dacheng Tao
Xiliang Lu
32
9
0
15 Mar 2024
Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains
Vilém Zouhar
Shuoyang Ding
Anna Currey
Tatyana Badeka
Jenyuan Wang
Brian Thompson
25
14
0
28 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
28
0
02 Feb 2024
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
Haoyi Qiu
Kung-Hsiang Huang
Jingnong Qu
Nanyun Peng
HILM
14
6
0
16 Nov 2023
ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models
Jierui Li
Vipul Raheja
Dhruv Kumar
SyDa
13
3
0
15 Nov 2023
Defining a New NLP Playground
Sha Li
Chi Han
Pengfei Yu
Carl N. Edwards
Manling Li
...
Yi Ren Fung
Charles Yu
Joel R. Tetreault
Eduard H. Hovy
Heng Ji
31
5
0
31 Oct 2023
Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards
Baban Gain
Ramakrishna Appicharla
Soumya Chennabasavaraj
Nikesh Garera
Asif Ekbal
M. Chelliah
14
0
0
23 Oct 2023
That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?
Jaechan Lee
Alisa Liu
Orevaoghene Ahia
Hila Gonen
Noah A. Smith
13
3
0
23 Oct 2023
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Qiushi Sun
Zhangyue Yin
Xiang Li
Zhiyong Wu
Xipeng Qiu
Lingpeng Kong
LRM
LLMAG
15
43
0
30 Sep 2023
Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models
M. Kamruzzaman
M. M. I. Shovon
Gene Louis Kim
38
12
0
16 Sep 2023
An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents
Maximilian Croissant
Madeleine Frister
Guy Schofield
Cade McCall
LLMAG
21
14
0
10 Sep 2023
BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training
Yiming Yan
Tao Wang
Chengqi Zhao
Shujian Huang
Jiajun Chen
Mingxuan Wang
14
22
0
06 Jul 2023
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
12
11
0
22 Jun 2023
Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4
Mario Rodríguez-Cantelar
Chen Zhang
Chengguang Tang
Ke Shi
Sarik Ghazarian
João Sedoc
L. F. D’Haro
Alexander I. Rudnicky
22
8
0
22 Jun 2023
Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation
Tianyu Yang
Thy Thy Tran
Iryna Gurevych
DiffM
13
1
0
24 May 2023
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Haoyi Qiu
Zi-Yi Dou
Tianlu Wang
Asli Celikyilmaz
Nanyun Peng
EGVM
13
8
0
24 May 2023
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist
Iftitahu Ni'mah
Meng Fang
Vlado Menkovski
Mykola Pechenizkiy
12
8
0
15 May 2023
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation
Nico Daheim
Nouha Dziri
Mrinmaya Sachan
Iryna Gurevych
E. Ponti
MoMe
21
30
0
30 Mar 2023
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
Varun Nair
Elliot Schumacher
Geoffrey Tso
Anitha Kannan
VLM
17
60
0
30 Mar 2023
MENLI: Robust Evaluation Metrics from Natural Language Inference
Yanran Chen
Steffen Eger
16
15
0
15 Aug 2022
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
90
142
0
24 Oct 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
229
1,281
0
18 Mar 2020
1