v1v2 (latest)

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

18 February 2024

Lei Li

ArXiv (abs)PDF HTML Github (8★)

Papers citing "Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement"

50 / 62 papers shown

Adaptive Multi-Agent Response Refinement in Conversational Systems

134

11 Nov 2025

RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG

126

06 Nov 2025

A Critical Study of Automatic Evaluation in Sign Language Translation

Shakib Yazdani

Yasser Hamidullah

C. España-Bonet

Eleftherios Avramidis

Josef van Genabith

SLR

334

29 Oct 2025

Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning

...

172

28 Oct 2025

ParsTranslit: Truly Versatile Tajik-Farsi Transliteration

Rayyan Merchant

Kevin Tang

08 Oct 2025

Deconstructing Self-Bias in LLM-generated Translation Benchmarks

153

30 Sep 2025

QUARTZ : QA-based Unsupervised Abstractive Refinement for Task-oriented Dialogue Summarization

Mohamed Imed Eddine Ghebriout

Gaël Guibon

Ivan Lerner

Emmanuel Vincent

107

30 Sep 2025

Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores

118

27 Sep 2025

Variation in Verification: Understanding Verification Dynamics in Large Language Models

186

22 Sep 2025

From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text

Ridwan Mahbub

Mohammed Saidul Islam

Mir Tafseer Nayeem

Md Tahmid Rahman Laskar

Mizanur Rahman

Shafiq Joty

Enamul Hoque

127

13 Aug 2025

Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge

Evangelia Spiliopoulou

163

08 Aug 2025

Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations

317

02 Jul 2025

Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation

326

05 Jun 2025

SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

381

05 Jun 2025

SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL

Balakrishnan Narayanaswamy

Tim Kraska

187

04 Jun 2025

Beyond the Surface: Measuring Self-Preference in LLM Judgments

188

03 Jun 2025

An Empirical Study of Group Conformity in Multi-Agent SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

204

02 Jun 2025

Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator

230

27 May 2025

How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark

262

24 May 2025

ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction

...

287

23 May 2025

MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation

Xi Wang

Jiaqian Hu

Safinah Ali

273

20 May 2025

LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations

288

27 Apr 2025

Reflexive Prompt Engineering: A Framework for Responsible Prompt Engineering and Interaction DesignConference on Fairness, Accountability and Transparency (FAccT), 2025

Christian Djeffal

519

22 Apr 2025

Societal Impacts Research Requires Benchmarks for Creative Composition Tasks

Judy Hanwen Shen

Carlos Guestrin

619

09 Apr 2025

Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making

752

05 Apr 2025

Do LLM Evaluators Prefer Themselves for a Reason?

359

04 Apr 2025

Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models

393

03 Apr 2025

Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach

Javier Coronado-Blázquez

HILM ELM

283

27 Mar 2025

Safety Aware Task Planning via Large Language Models in Robotics

348

19 Mar 2025

Grounded Chain-of-Thought for Multimodal Large Language Models

466

17 Mar 2025

Rethinking Prompt-based Debiasing in Large Language Models

416

12 Mar 2025

Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation

311

11 Mar 2025

KSOD: Knowledge Supplement for LLMs On Demand

Haoran Li

Junfeng Hu

299

10 Mar 2025

Training LLM-based Tutors to Improve Student Learning Outcomes in DialoguesInternational Conference on Artificial Intelligence in Education (AIED), 2025

434

09 Mar 2025

PromptPex: Automatic Test Generation for Language Model Prompts

244

07 Mar 2025

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

299

04 Mar 2025

What do Large Language Models Say About Animals? Investigating Risks of Animal Harm in Generated TextConference on Fairness, Accountability and Transparency (FAccT), 2025

Samuel David Tucker-Davis

438

03 Mar 2025

Reward Shaping to Mitigate Reward Hacking in RLHF

627

26 Feb 2025

CLIPPER: Compression enables long-context synthetic data generation

443

20 Feb 2025

Preference Leakage: A Contamination Problem in LLM-as-a-judge

617

03 Feb 2025

The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input

...

496

06 Jan 2025

Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network

Ritik Mehta

Olha Jurecková

Mark Stamp

315

170

25 Dec 2024

Visual Prompting with Iterative Refinement for Design Critique Generation

341

22 Dec 2024

Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers

565

26 Nov 2024

VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward ModelsComputer Vision and Pattern Recognition (CVPR), 2024

...

539

26 Nov 2024

CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt EngineeringInternational Conference on Intelligent User Interfaces (IUI), 2024

304

09 Nov 2024

Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Kenji Kawaguchi

236

01 Nov 2024

LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation

131

28 Oct 2024

Improving Model Factuality with Fine-grained Critique-based EvaluatorAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

542

24 Oct 2024

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation SystemsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Nandan Thakur

Suleman Kazi

Ge Luo

Jimmy J. Lin

Amin Ahmad

VLM RALM

468

17 Oct 2024