Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2402.11436
Cited By
v1
v2 (latest)
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
18 February 2024
Wenda Xu
Guanglei Zhu
Xuandong Zhao
Liangming Pan
Lei Li
Wenjie Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Github (8★)
Papers citing
"Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement"
50 / 62 papers shown
Adaptive Multi-Agent Response Refinement in Conversational Systems
Soyeong Jeong
Aparna Elangovan
Emine Yilmaz
Oleg Rokhlenko
LLMAG
134
1
0
11 Nov 2025
RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG
Joshua Gao
Quoc Huy Pham
Subin Varghese
Silwal Saurav
Vedhus Hoskere
126
1
0
06 Nov 2025
A Critical Study of Automatic Evaluation in Sign Language Translation
Shakib Yazdani
Yasser Hamidullah
C. España-Bonet
Eleftherios Avramidis
Josef van Genabith
SLR
334
0
0
29 Oct 2025
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
Zhiheng Xi
Jixuan Huang
Xin Guo
Boyang Hong
Dingwen Yang
...
Jiecao Chen
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
LRM
172
0
0
28 Oct 2025
ParsTranslit: Truly Versatile Tajik-Farsi Transliteration
Rayyan Merchant
Kevin Tang
93
0
0
08 Oct 2025
Deconstructing Self-Bias in LLM-generated Translation Benchmarks
Wenda Xu
Sweta Agrawal
Vilém Zouhar
Markus Freitag
Daniel Deutsch
153
0
0
30 Sep 2025
QUARTZ : QA-based Unsupervised Abstractive Refinement for Task-oriented Dialogue Summarization
Mohamed Imed Eddine Ghebriout
Gaël Guibon
Ivan Lerner
Emmanuel Vincent
107
0
0
30 Sep 2025
Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores
Ashwin Ramaswamy
Nestor Demeure
Ermal Rrapaj
ALM
ELM
118
0
0
27 Sep 2025
Variation in Verification: Understanding Verification Dynamics in Large Language Models
Yefan Zhou
Austin Xu
Yilun Zhou
Janvijay Singh
Jiang Gui
Shafiq Joty
LRM
186
6
0
22 Sep 2025
From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text
Ridwan Mahbub
Mohammed Saidul Islam
Mir Tafseer Nayeem
Md Tahmid Rahman Laskar
Mizanur Rahman
Shafiq Joty
Enamul Hoque
127
0
0
13 Aug 2025
Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge
Evangelia Spiliopoulou
Riccardo Fogliato
Hanna Burnsky
Tamer Soliman
Jie Ma
Graham Horwood
Miguel Ballesteros
163
9
0
08 Aug 2025
Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations
Wenhao Wang
Yanyan Li
Long Jiao
Jiawei Yuan
317
2
0
02 Jul 2025
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation
Noy Sternlicht
Ariel Gera
Roy Bar-Haim
Kyle Lo
Noam Slonim
ELM
326
0
0
05 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
381
2
0
05 Jun 2025
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL
Yue Gong
Chuan Lei
X. Qin
Kapil Vaidya
Balakrishnan Narayanaswamy
Tim Kraska
187
5
0
04 Jun 2025
Beyond the Surface: Measuring Self-Preference in LLM Judgments
Zhi-Yuan Chen
Hao Wang
Xinyu Zhang
Enrui Hu
Yankai Lin
188
5
0
03 Jun 2025
An Empirical Study of Group Conformity in Multi-Agent Systems
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Min Choi
Keonwoo Kim
Sungwon Chae
Sangyeob Baek
LLMAG
AI4CE
204
2
0
02 Jun 2025
Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator
Peiwen Yuan
Yiwei Li
Shaoxiong Feng
Xinglin Wang
Y. Zhang
Jiayi Shi
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
230
3
0
27 May 2025
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark
Minglai Yang
Ethan Huang
Liang Zhang
Mihai Surdeanu
William Yang Wang
Liangming Pan
LRM
262
18
0
24 May 2025
ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction
Yan Yu
Yilun Liu
Minggui He
Shimin Tao
Weibin Meng
...
Hongxia Ma
Yan Yu
Hao Yang
Boxing Chen
Fuliang Li
287
1
0
23 May 2025
MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation
Xi Wang
Jiaqian Hu
Safinah Ali
273
5
0
20 May 2025
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
Laura Dietz
Oleg Zendel
P. Bailey
Charles L. A. Clarke
Ellese Cotterill
Jeff Dalton
Faegheh Hasibi
Mark Sanderson
Nick Craswell
ELM
288
11
0
27 Apr 2025
Reflexive Prompt Engineering: A Framework for Responsible Prompt Engineering and Interaction Design
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Christian Djeffal
519
4
0
22 Apr 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen
Carlos Guestrin
619
2
0
09 Apr 2025
Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making
Yougang Lyu
Shijie Ren
Yue Feng
Zihan Wang
Zhongfu Chen
Zhaochun Ren
Maarten de Rijke
752
1
0
05 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
359
22
0
04 Apr 2025
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Liangjie Huang
Dawei Li
Huan Liu
Lu Cheng
LRM
393
0
0
03 Apr 2025
Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach
Javier Coronado-Blázquez
HILM
ELM
283
0
0
27 Mar 2025
Safety Aware Task Planning via Large Language Models in Robotics
A. Khan
Michael Andrev
Muhammad Ali Murtaza
Sergio Aguilera
Rui Zhang
Jie Ding
Seth Hutchinson
Ali Anwar
LLMAG
348
16
0
19 Mar 2025
Grounded Chain-of-Thought for Multimodal Large Language Models
Qiong Wu
Xiangcong Yang
Weihao Ye
Chenxin Fang
Baiyang Song
Xiaoshuai Sun
Rongrong Ji
LRM
466
23
0
17 Mar 2025
Rethinking Prompt-based Debiasing in Large Language Models
Xinyi Yang
Runzhe Zhan
Yang Li
Shu Yang
Junchao Wu
Lidia S. Chao
ALM
416
3
0
12 Mar 2025
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
Wenlong Meng
Fan Zhang
Wendao Yao
Zhenyuan Guo
Yongqian Li
Chengkun Wei
Wenzhi Chen
AAML
311
8
0
11 Mar 2025
KSOD: Knowledge Supplement for LLMs On Demand
Haoran Li
Junfeng Hu
299
0
0
10 Mar 2025
Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues
International Conference on Artificial Intelligence in Education (AIED), 2025
Alexander Scarlatos
Naiming Liu
Jaewook Lee
Richard Baraniuk
Andrew Lan
434
20
0
09 Mar 2025
PromptPex: Automatic Test Generation for Language Model Prompts
Reshabh K Sharma
Jonathan De Halleux
Shraddha Barke
Benjamin Zorn
VLM
244
5
0
07 Mar 2025
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Zhibin Lan
Liqiang Niu
Fandong Meng
Jie Zhou
Jinsong Su
VLM
299
1
0
04 Mar 2025
What do Large Language Models Say About Animals? Investigating Risks of Animal Harm in Generated Text
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Arturs Kanepajs
Aditi Basu
Sankalpa Ghose
Constance Li
Akshat Mehta
Ronak Mehta
Samuel David Tucker-Davis
Eric Zhou
Bob Fischer
Jacy Reese Anthis
ELM
ALM
438
7
0
03 Mar 2025
Reward Shaping to Mitigate Reward Hacking in RLHF
Jiayi Fu
Xuandong Zhao
Chengyuan Yao
Han Wang
Qi Han
Yanghua Xiao
627
45
0
26 Feb 2025
CLIPPER: Compression enables long-context synthetic data generation
Chau Minh Pham
Yapei Chang
Mohit Iyyer
SyDa
443
2
0
20 Feb 2025
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
Jiawei Han
Wei Wei
Wei Wang
Huan Liu
617
73
0
03 Feb 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Alon Jacovi
Andrew Wang
Chris Alberti
Connie Tao
Jon Lipovetz
...
Rachana Fellinger
Rui Wang
Zizhao Zhang
Sasha Goldshtein
Dipanjan Das
HILM
ALM
496
29
0
06 Jan 2025
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
315
170
0
25 Dec 2024
Visual Prompting with Iterative Refinement for Design Critique Generation
Peitong Duan
Chin-Yi Cheng
Bjoern Hartmann
Yang Li
341
2
0
22 Dec 2024
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Benedikt Stroebl
Sayash Kapoor
Arvind Narayanan
LRM
565
42
0
26 Nov 2024
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Computer Vision and Pattern Recognition (CVPR), 2024
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Dianbo Sui
Qiang Liu
VLM
CoGe
539
63
0
26 Nov 2024
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering
International Conference on Intelligent User Interfaces (IUI), 2024
Ishika Joshi
Simra Shahid
Shreeya Venneti
Manushree Vasu
Yantao Zheng
Yunyao Li
Balaji Krishnamurthy
Gromit Yeuk-Yin Chan
304
19
0
09 Nov 2024
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Do Xuan Long
Duong Ngoc Yen
Anh Tuan Luu
Kenji Kawaguchi
Min-Yen Kan
Nancy F. Chen
KELM
ELM
LRM
236
13
0
01 Nov 2024
LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation
Yen-Shan Chen
Jing Jin
Peng-Ting Kuo
Chao-Wei Huang
Yun-Nung Chen
131
2
0
28 Oct 2024
Improving Model Factuality with Fine-grained Critique-based Evaluator
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
...
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
HILM
542
12
0
24 Oct 2024
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Nandan Thakur
Suleman Kazi
Ge Luo
Jimmy J. Lin
Amin Ahmad
VLM
RALM
468
14
0
17 Oct 2024
1
2
Next
Page 1 of 2