ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.01534
  4. Cited By
Preference Leakage: A Contamination Problem in LLM-as-a-judge
v1v2 (latest)

Preference Leakage: A Contamination Problem in LLM-as-a-judge

3 February 2025
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
Jiawei Han
Wei Wei
Wei Wang
Huan Liu
ArXiv (abs)PDFHTMLHuggingFace (41 upvotes)

Papers citing "Preference Leakage: A Contamination Problem in LLM-as-a-judge"

17 / 117 papers shown
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Benchmarking Cognitive Biases in Large Language Models as EvaluatorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ryan Koo
Linghe Wang
Vipul Raheja
Jong Inn Park
Min Namgung
Luan Tuyen Chau
ALM
317
124
0
29 Sep 2023
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Time Travel in LLMs: Tracing Data Contamination in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Shahriar Golchin
Mihai Surdeanu
446
144
0
16 Aug 2023
AgentBench: Evaluating LLMs as Agents
AgentBench: Evaluating LLMs as AgentsInternational Conference on Learning Representations (ICLR), 2023
Xiao-Yang Liu
Hao Yu
Hanchen Zhang
Yifan Xu
Xuanyu Lei
...
Yu-Chuan Su
Huan Sun
Shiyu Huang
Yuxiao Dong
Jie Tang
ELMLLMAG
527
494
0
07 Aug 2023
Won't Get Fooled Again: Answering Questions with False Premises
Won't Get Fooled Again: Answering Questions with False PremisesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Shengding Hu
Yi-Xiao Luo
Huadong Wang
Xingyi Cheng
Zhiyuan Liu
Maosong Sun
224
41
0
05 Jul 2023
Assisting Language Learners: Automated Trans-Lingual Definition
  Generation via Contrastive Prompt Learning
Assisting Language Learners: Automated Trans-Lingual Definition Generation via Contrastive Prompt LearningWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2023
Hengyuan Zhang
Dawei Li
Yanran Li
Chenming Shang
Chufan Shi
Yong Jiang
294
17
0
09 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaNeural Information Processing Systems (NeurIPS), 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
3.2K
6,557
0
09 Jun 2023
A New Dataset and Empirical Study for Sentence Simplification in Chinese
A New Dataset and Empirical Study for Sentence Simplification in ChineseAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Shiping Yang
Renliang Sun
Xiao-Yi Wan
256
10
0
07 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
864
6,697
0
29 May 2023
OpenAssistant Conversations -- Democratizing Large Language Model
  Alignment
OpenAssistant Conversations -- Democratizing Large Language Model AlignmentNeural Information Processing Systems (NeurIPS), 2023
Andreas Kopf
Yannic Kilcher
Dimitri von Rutte
Sotiris Anagnostidis
Zhi Rui Tam
...
Arnav Dantuluri
Andrew Maguire
Christoph Schuhmann
Huu Nguyen
A. Mattick
ALMLM&MA
767
783
0
14 Apr 2023
Human-like Summarization Evaluation with ChatGPT
Human-like Summarization Evaluation with ChatGPT
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALMAI4MH
201
169
0
05 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
4.6K
20,717
0
15 Mar 2023
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Towards a Unified Multi-Dimensional Evaluator for Text GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ming Zhong
Yang Liu
Da Yin
Yuning Mao
Yizhu Jiao
Peng Liu
Chenguang Zhu
Heng Ji
Jiawei Han
ELM
250
327
0
13 Oct 2022
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human FalsehoodsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
1.6K
2,670
0
08 Sep 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled CorpusConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
309
562
0
18 Apr 2021
Memorization vs. Generalization: Quantifying Data Leakage in NLP
  Performance Evaluation
Memorization vs. Generalization: Quantifying Data Leakage in NLP Performance EvaluationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2021
Aparna Elangovan
Jiayuan He
Karin Verspoor
TDIFedML
344
107
0
03 Feb 2021
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
2.4K
7,458
0
21 Apr 2019
How NOT To Evaluate Your Dialogue System: An Empirical Study of
  Unsupervised Evaluation Metrics for Dialogue Response Generation
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
Chia-Wei Liu
Ryan J. Lowe
Iulian Serban
Michael Noseworthy
Laurent Charlin
Joelle Pineau
384
1,358
0
25 Mar 2016
Previous
123