ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.01534
  4. Cited By
Preference Leakage: A Contamination Problem in LLM-as-a-judge
v1v2 (latest)

Preference Leakage: A Contamination Problem in LLM-as-a-judge

3 February 2025
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
Jiawei Han
Wei Wei
Wei Wang
Huan Liu
ArXiv (abs)PDFHTMLHuggingFace (41 upvotes)

Papers citing "Preference Leakage: A Contamination Problem in LLM-as-a-judge"

50 / 117 papers shown
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in AlignmentNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Sizhe Wang
Yongqi Tong
Hengyuan Zhang
Dawei Li
Xin Zhang
Tianlong Chen
484
15
0
21 Feb 2025
CLIPPER: Compression enables long-context synthetic data generation
CLIPPER: Compression enables long-context synthetic data generation
Chau Minh Pham
Yapei Chang
Mohit Iyyer
SyDa
443
2
0
20 Feb 2025
Who Taught You That? Tracing Teachers in Model Distillation
Who Taught You That? Tracing Teachers in Model DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Somin Wadhwa
Chantal Shaib
Silvio Amir
Byron C. Wallace
577
4
0
10 Feb 2025
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
Javier Rando
Jie Zhang
Nicholas Carlini
F. Tramèr
AAMLELM
368
21
0
04 Feb 2025
Quantification of Large Language Model Distillation
Quantification of Large Language Model DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sunbowen Lee
Junting Zhou
Chang Ao
Kaige Li
Xinrun Du
...
Hamid Alinejad-Rokny
Min Yang
Yitao Liang
Zhoufutu Wen
Shiwen Ni
310
0
0
22 Jan 2025
Assessing the Impact of Conspiracy Theories Using Large Language Models
Assessing the Impact of Conspiracy Theories Using Large Language Models
Bohan Jiang
Dawei Li
Zhen Tan
Xinyi Zhou
Ashwin Rao
Kristina Lerman
H. Bernard
Huan Liu
430
4
0
09 Dec 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELMAILaw
1.2K
311
0
25 Nov 2024
ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework
ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework
Hengyuan Zhang
Chenming Shang
Sizhe Wang
Dongdong Zhang
Feng Yao
Renliang Sun
Yiyao Yu
Yujiu Yang
Furu Wei
629
6
0
25 Oct 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
411
106
0
14 Oct 2024
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Justice or Prejudice? Quantifying Biases in LLM-as-a-JudgeInternational Conference on Learning Representations (ICLR), 2024
Jiayi Ye
Zixiang Xu
Yue Huang
Dongping Chen
Qihui Zhang
...
Werner Geyer
Chao Huang
Pin-Yu Chen
Nitesh Chawla
Xiangliang Zhang
ELM
368
207
0
03 Oct 2024
Law of the Weakest Link: Cross Capabilities of Large Language Models
Law of the Weakest Link: Cross Capabilities of Large Language Models
Ming Zhong
Aston Zhang
Xuewei Wang
Rui Hou
Wenhan Xiong
...
Melanie Kambadur
Dhruv Mahajan
Sergey Edunov
Jiawei Han
Laurens van der Maaten
ELM
182
10
0
30 Sep 2024
Exploring Large Language Models for Feature Selection: A Data-centric
  Perspective
Exploring Large Language Models for Feature Selection: A Data-centric PerspectiveSIGKDD Explorations (SIGKDD Explor.), 2024
Dawei Li
Zhen Tan
Huan Liu
LM&MA
248
26
0
21 Aug 2024
Fostering Natural Conversation in Large Language Models with NICO: a
  Natural Interactive COnversation dataset
Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset
Renliang Sun
Mengyuan Liu
Shiping Yang
Rui Wang
Junqing He
Jiaxing Zhang
249
4
0
18 Aug 2024
DataGen: Unified Synthetic Dataset Generation via Large Language Models
DataGen: Unified Synthetic Dataset Generation via Large Language ModelsIEEE International Joint Conference on Neural Network (IJCNN), 2025
Yue Huang
Siyuan Wu
Chujie Gao
Dongping Chen
Qihui Zhang
...
Tianyi Zhou
Xiangliang Zhang
Jianfeng Gao
Chaowei Xiao
Lichao Sun
SyDa
613
22
0
27 Jun 2024
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White
Samuel Dooley
Manley Roberts
Arka Pal
Ben Feuer
...
Willie Neiswanger
Micah Goldblum
Tom Goldstein
Willie Neiswanger
Micah Goldblum
ELM
389
59
0
27 Jun 2024
Unveiling the Spectrum of Data Contamination in Language Models: A
  Survey from Detection to Remediation
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
270
29
0
20 Jun 2024
Uncovering Latent Memories: Assessing Data Leakage and Memorization
  Patterns in Frontier AI Models
Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Frontier AI Models
Sunny Duan
Mikail Khona
Abhiram Iyer
Rylan Schaeffer
Ila R Fiete
414
5
0
20 Jun 2024
Data Contamination Can Cross Language Barriers
Data Contamination Can Cross Language BarriersConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Feng Yao
Yufan Zhuang
Zihao Sun
Sunan Xu
Animesh Kumar
Jingbo Shang
209
22
0
19 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELMALM
850
140
0
18 Jun 2024
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and
  BenchBuilder Pipeline
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Tianle Li
Wei-Lin Chiang
Evan Frick
Lisa Dunlap
Tianhao Wu
Banghua Zhu
Joseph E. Gonzalez
Ion Stoica
ALM
351
331
0
17 Jun 2024
Measuring memorization in RLHF for code completion
Measuring memorization in RLHF for code completion
Aneesh Pappu
Billy Porter
Ilia Shumailov
Jamie Hayes
338
10
0
17 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELMALM
287
86
0
06 Jun 2024
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Jinjie Ni
Fuzhao Xue
Xiang Yue
Yuntian Deng
Mahir Shah
Kabir Jain
Graham Neubig
Yang You
ELM
211
73
0
03 Jun 2024
DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's
  Disease Questions with Scientific Literature
DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature
Dawei Li
Shu Yang
Zhen Tan
Jae Young Baik
Sunkwon Yun
...
D. Duong-Tran
Ying Ding
Huan Liu
Li Shen
Tianlong Chen
339
63
0
08 May 2024
Prometheus 2: An Open Source Language Model Specialized in Evaluating
  Other Language Models
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMeALMELM
389
331
0
02 May 2024
Balancing Speciality and Versatility: a Coarse to Fine Framework for
  Supervised Fine-tuning Large Language Model
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
Hengyuan Zhang
Yanru Wu
Dawei Li
Zacc Yang
Rui Zhao
Yong Jiang
Fei Tan
ALM
487
1
0
16 Apr 2024
LLM Evaluators Recognize and Favor Their Own Generations
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
Samuel R. Bowman
Shi Feng
443
366
0
15 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
464
617
0
06 Apr 2024
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to
  Boost for Reasoning
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning
Yongqi Tong
Dawei Li
Sizhe Wang
Yujia Wang
Fei Teng
Jingbo Shang
LRM
415
85
0
29 Mar 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
550
121
0
26 Mar 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
686
1,212
0
20 Mar 2024
Elephants Never Forget: Testing Language Models for Memorization of
  Tabular Data
Elephants Never Forget: Testing Language Models for Memorization of Tabular Data
Sebastian Bordt
Harsha Nori
Rich Caruana
LMTD
233
21
0
11 Mar 2024
Generalization or Memorization: Data Contamination and Trustworthy
  Evaluation for Large Language Models
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models
Yihong Dong
Xue Jiang
Huanyu Liu
Zhi Jin
Bin Gu
Mengfei Yang
Ge Li
390
123
0
24 Feb 2024
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
Wenda Xu
Guanglei Zhu
Xuandong Zhao
Liangming Pan
Lei Li
Wenjie Wang
298
92
0
18 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
568
214
0
16 Feb 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM
  Instruction-Tuning
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Jiuxiang Gu
Wanrong Zhu
441
79
0
15 Feb 2024
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via
  Self-Evaluation
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang
Baolin Peng
Ye Tian
Jingyan Zhou
Lifeng Jin
Linfeng Song
Haitao Mi
Chao Yang
HILM
297
97
0
14 Feb 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in
  Closed-Source LLMs
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILMELMPILM
452
259
0
06 Feb 2024
Contextualization Distillation from Large Language Model for Knowledge
  Graph Completion
Contextualization Distillation from Large Language Model for Knowledge Graph CompletionFindings (Findings), 2024
Dawei Li
Zhen Tan
Tianlong Chen
Huan Liu
KELM
387
23
0
28 Jan 2024
Investigating Data Contamination for Pre-training Language Models
Investigating Data Contamination for Pre-training Language Models
Minhao Jiang
Katja Filippova
Ming Zhong
Rylan Schaeffer
Siru Ouyang
Jiawei Han
Sanmi Koyejo
321
93
0
11 Jan 2024
LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be
  Detected?
LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?
Qihui Zhang
Chujie Gao
Dongping Chen
Yue Huang
Yixin Huang
...
Shilin Zhang
Weiye Li
Zhengyan Fu
Yao Wan
Lichao Sun
DeLMO
310
46
0
11 Jan 2024
Task Contamination: Language Models May Not Be Few-Shot Anymore
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li
Jeffrey Flanigan
378
130
0
26 Dec 2023
What Makes Good Data for Alignment? A Comprehensive Study of Automatic
  Data Selection in Instruction Tuning
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Wei Liu
Weihao Zeng
Keqing He
Yong Jiang
Junxian He
ALM
430
325
0
25 Dec 2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models
AlignBench: Benchmarking Chinese Alignment of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xiao Liu
Xuanyu Lei
Sheng-Ping Wang
Yue Huang
Zhuoer Feng
...
Hongning Wang
Jing Zhang
Shiyu Huang
Yuxiao Dong
Jie Tang
ELMLM&MAALM
381
69
0
30 Nov 2023
Investigating Data Contamination in Modern Benchmarks for Large Language
  Models
Investigating Data Contamination in Modern Benchmarks for Large Language Models
Chunyuan Deng
Yilun Zhao
Xiangru Tang
Mark B. Gerstein
Arman Cohan
AAMLELM
391
113
0
16 Nov 2023
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Yiqi Liu
N. Moosavi
Chenghua Lin
ELM
441
79
0
16 Nov 2023
Ziya2: Data-centric Learning is All LLMs Need
Ziya2: Data-centric Learning is All LLMs Need
Ruyi Gan
Ziwei Wu
Renliang Sun
Junyu Lu
Xiaojun Wu
...
Ping Yang
Qi Yang
Hao Wang
Jiaxing Zhang
Yan Song
VLMALM
301
26
0
06 Nov 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALMLM&MAELM
531
375
0
12 Oct 2023
Mistral 7B
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELRM
394
3,000
0
10 Oct 2023
MetaTool Benchmark for Large Language Models: Deciding Whether to Use
  Tools and Which to Use
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to UseInternational Conference on Learning Representations (ICLR), 2023
Yue Huang
Jiawen Shi
Yuan Li
Chenrui Fan
Siyuan Wu
...
Yixin Liu
Pan Zhou
Yao Wan
Neil Zhenqiang Gong
Lichao Sun
LLMAG
533
151
0
04 Oct 2023
Previous
123
Next
Page 2 of 3