Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.02823
Cited By
Evading Data Contamination Detection for Language Models is (too) Easy
5 February 2024
Jasper Dekoninck
Mark Niklas Muller
Maximilian Baader
Marc Fischer
Martin Vechev
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evading Data Contamination Detection for Language Models is (too) Easy"
9 / 9 papers shown
Title
LSHBloom: Memory-efficient, Extreme-scale Document Deduplication
A. Khan
Robert Underwood
Carlo Siebenschuh
Y. Babuji
Aswathy Ajith
Kyle Hippe
Ozan Gokdemir
Alexander Brace
Kyle Chard
Ian T. Foster
31
0
0
06 Nov 2024
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
38
3
0
24 Oct 2024
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović
Robin Staab
Maximilian Baader
Martin Vechev
43
1
0
04 Oct 2024
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Jasper Dekoninck
Maximilian Baader
Martin Vechev
ALM
85
0
0
01 Sep 2024
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li
Jeffrey Flanigan
79
87
0
26 Dec 2023
Data Contamination Through the Lens of Time
Manley Roberts
Himanshu Thakur
Christine Herlihy
Colin White
Samuel Dooley
75
30
0
16 Oct 2023
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
1