Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.12343
Cited By
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test Construction
19 December 2023
Yucheng Li
Frank Geurin
Chenghua Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test Construction"
19 / 19 papers shown
Title
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Yifan Sun
Han Wang
Dongbai Li
Gang Wang
Huan Zhang
AAML
55
0
0
20 Mar 2025
LLM as a Broken Telephone: Iterative Generation Distorts Information
Amr Mohamed
Mingmeng Geng
Michalis Vazirgiannis
Guokan Shang
75
1
0
27 Feb 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
H. Li
Tao Xie
Baishakhi Ray
48
3
0
23 Feb 2025
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Jacob Haimes
Cenny Wenner
Kunvar Thaman
Vassil Tashev
Clement Neo
Esben Kran
Jason Schreiber
32
5
0
11 Oct 2024
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
45
1
0
04 Oct 2024
Reliable and diverse evaluation of LLM medical knowledge mastery
Yuxuan Zhou
Xien Liu
Chen Ning
Xiao Zhang
Ji Wu
MedIm
31
0
0
22 Sep 2024
LLM Internal States Reveal Hallucination Risk Faced With a Query
Ziwei Ji
Delong Chen
Etsuko Ishii
Samuel Cahyawijaya
Yejin Bang
Bryan Wilie
Pascale Fung
HILM
LRM
36
19
0
03 Jul 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
37
8
0
25 Jun 2024
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
27
13
0
20 Jun 2024
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation
Qin Zhu
Qingyuan Cheng
Runyu Peng
Xiaonan Li
Tengxiao Liu
Ru Peng
Xipeng Qiu
Xuanjing Huang
38
6
0
20 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
38
0
06 Jun 2024
DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning
Shangqing Tu
Kejian Zhu
Yushi Bai
Zijun Yao
Lei Hou
Juanzi Li
42
4
0
06 Jun 2024
UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs
Chaoqun He
Renjie Luo
Shengding Hu
Yuanqian Zhao
Jie Zhou
Hanghao Wu
Jiajie Zhang
Xu Han
Zhiyuan Liu
Maosong Sun
ELM
31
13
0
11 Apr 2024
Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges
Bosheng Ding
Chengwei Qin
Ruochen Zhao
Tianze Luo
Xinze Li
Guizhen Chen
Wenhan Xia
Junjie Hu
A. Luu
Shafiq R. Joty
29
18
0
05 Mar 2024
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models
Yihong Dong
Xue Jiang
Huanyu Liu
Zhi Jin
Bin Gu
Mengfei Yang
Ge Li
24
44
0
24 Feb 2024
Evading Data Contamination Detection for Language Models is (too) Easy
Jasper Dekoninck
Mark Niklas Muller
Maximilian Baader
Marc Fischer
Martin Vechev
91
18
0
05 Feb 2024
Evaluating Large Language Models for Generalization and Robustness via Data Compression
Yucheng Li
Yunhao Guo
Frank Guerin
Chenghua Lin
ELM
27
5
0
01 Feb 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,814
0
14 Dec 2020
1