ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.14644
  4. Cited By
Unveiling the Spectrum of Data Contamination in Language Models: A
  Survey from Detection to Remediation

Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation

20 June 2024
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
ArXivPDFHTML

Papers citing "Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation"

14 / 14 papers shown
Title
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
45
5
0
21 Jan 2025
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Yuan Gao
Dokyun Lee
Gordon Burtch
Sina Fazelpour
LRM
40
7
0
25 Oct 2024
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
38
3
0
24 Oct 2024
Data Contamination Calibration for Black-box LLMs
Data Contamination Calibration for Black-box LLMs
Wen-song Ye
Jiaqi Hu
Liyao Li
Haobo Wang
Gang Chen
Junbo Zhao
26
6
0
20 May 2024
Investigating the Impact of Data Contamination of Large Language Models
  in Text-to-SQL Translation
Investigating the Impact of Data Contamination of Large Language Models in Text-to-SQL Translation
Federico Ranaldi
Elena Sofia Ruzzetti
Dario Onorati
Leonardo Ranaldi
Cristina Giannone
Andrea Favalli
Raniero Romagnoli
Fabio Massimo Zanzotto
54
17
0
12 Feb 2024
Evading Data Contamination Detection for Language Models is (too) Easy
Evading Data Contamination Detection for Language Models is (too) Easy
Jasper Dekoninck
Mark Niklas Muller
Maximilian Baader
Marc Fischer
Martin Vechev
79
18
0
05 Feb 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
124
349
0
01 Feb 2024
Task Contamination: Language Models May Not Be Few-Shot Anymore
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li
Jeffrey Flanigan
79
87
0
26 Dec 2023
Don't Make Your LLM an Evaluation Benchmark Cheater
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
102
136
0
03 Nov 2023
Data Contamination Through the Lens of Time
Data Contamination Through the Lens of Time
Manley Roberts
Himanshu Thakur
Christine Herlihy
Colin White
Samuel Dooley
75
30
0
16 Oct 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language
  Models
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
189
260
0
28 Apr 2023
Can we trust the evaluation on ChatGPT?
Can we trust the evaluation on ChatGPT?
Rachith Aiyappa
Jisun An
Haewoon Kwak
Yong-Yeol Ahn
ELM
ALM
LLMAG
AI4MH
LRM
106
76
0
22 Mar 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
1