ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.08493
  4. Cited By
Time Travel in LLMs: Tracing Data Contamination in Large Language Models

Time Travel in LLMs: Tracing Data Contamination in Large Language Models

16 August 2023
Shahriar Golchin
Mihai Surdeanu
ArXivPDFHTML

Papers citing "Time Travel in LLMs: Tracing Data Contamination in Large Language Models"

50 / 68 papers shown
Title
STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings
STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings
Saksham Rastogi
Pratyush Maini
Danish Pruthi
37
0
0
18 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
63
4
0
09 Apr 2025
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Yifan Sun
Han Wang
Dongbai Li
Gang Wang
Huan Zhang
AAML
53
0
0
20 Mar 2025
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models
Sherzod Hakimov
Lara Pfennigschmidt
David Schlangen
ELM
53
0
0
17 Feb 2025
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
J. Han
X. Zhang
Wei Wang
Huan Liu
65
11
0
03 Feb 2025
Using Large Language Models for Automated Grading of Student Writing
  about Science
Using Large Language Models for Automated Grading of Student Writing about Science
Chris Impey
Matthew Wenger
Nikhil Garuda
Shahriar Golchin
Sarah Stamer
ELM
AI4Ed
32
2
0
25 Dec 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
106
61
0
25 Nov 2024
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and
  Establishing Best Practices
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Anka Reuel
Amelia F. Hardy
Chandler Smith
Max Lamparth
Malcolm Hardy
Mykel J. Kochenderfer
ELM
62
16
0
20 Nov 2024
CODECLEANER: Elevating Standards with A Robust Data Contamination Mitigation Toolkit
Jialun Cao
Songqiang Chen
Wuqi Zhang
Hau Ching Lo
S. Cheung
28
0
0
16 Nov 2024
Benchmarking LLMs' Judgments with No Gold Standard
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu
Yuxuan Lu
Grant Schoenebeck
Yuqing Kong
29
1
0
11 Nov 2024
Membership Inference Attacks against Large Vision-Language Models
Membership Inference Attacks against Large Vision-Language Models
Zhan Li
Yongtao Wu
Yihang Chen
F. Tonin
Elias Abad Rocamora
V. Cevher
39
4
0
05 Nov 2024
Contamination Report for Multilingual Benchmarks
Contamination Report for Multilingual Benchmarks
Sanchit Ahuja
Varun Gumma
Sunayana Sitaram
16
0
0
21 Oct 2024
What's New in My Data? Novelty Exploration via Contrastive Generation
What's New in My Data? Novelty Exploration via Contrastive Generation
Masaru Isonuma
Ivan Titov
16
0
0
18 Oct 2024
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Jacob Haimes
Cenny Wenner
Kunvar Thaman
Vassil Tashev
Clement Neo
Esben Kran
Jason Schreiber
19
5
0
11 Oct 2024
Language model developers should report train-test overlap
Language model developers should report train-test overlap
Andy K. Zhang
Kevin Klyman
Yifan Mai
Yoav Levine
Yian Zhang
Rishi Bommasani
Percy Liang
VLM
ELM
18
8
0
10 Oct 2024
Fine-tuning can Help Detect Pretraining Data from Large Language Models
Fine-tuning can Help Detect Pretraining Data from Large Language Models
H. Zhang
Songxin Zhang
Bingyi Jing
Hongxin Wei
34
0
0
09 Oct 2024
How Much Can We Forget about Data Contamination?
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
41
1
0
04 Oct 2024
Quantifying Generalization Complexity for Large Language Models
Quantifying Generalization Complexity for Large Language Models
Zhenting Qi
Hongyin Luo
Xuliang Huang
Zhuokai Zhao
Yibo Jiang
Xiangjun Fan
Himabindu Lakkaraju
James Glass
LRM
ELM
26
5
0
02 Oct 2024
Not All LLM Reasoners Are Created Equal
Not All LLM Reasoners Are Created Equal
Arian Hosseini
Alessandro Sordoni
Daniel Toyama
Aaron C. Courville
Rishabh Agarwal
LRM
33
11
0
02 Oct 2024
Biomedical Large Languages Models Seem not to be Superior to Generalist
  Models on Unseen Medical Data
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data
Felix J. Dorfner
Amin Dada
Felix Busch
Marcus R. Makowski
T. Han
...
Jens Kleesiek
Madhumita Sushil
Jacqueline Lammert
Lisa Christine Adams
Keno K. Bressem
ELM
AI4MH
LM&MA
31
4
0
25 Aug 2024
Adaptive Pre-training Data Detection for Large Language Models via
  Surprising Tokens
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens
Anqi Zhang
Chaofeng Wu
22
4
0
30 Jul 2024
Qwen2 Technical Report
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Bo Zheng
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
47
458
0
15 Jul 2024
Training on the Test Task Confounds Evaluation and Emergence
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
49
6
1
10 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
17
25
0
04 Jul 2024
UniGen: A Unified Framework for Textual Dataset Generation Using Large
  Language Models
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models
Siyuan Wu
Yue Huang
Chujie Gao
Dongping Chen
Qihui Zhang
...
Tianyi Zhou
Xiangliang Zhang
Jianfeng Gao
Chaowei Xiao
Lichao Sun
SyDa
28
21
0
27 Jun 2024
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White
Samuel Dooley
Manley Roberts
Arka Pal
Ben Feuer
...
W. Neiswanger
Micah Goldblum
Tom Goldstein
Willie Neiswanger
Micah Goldblum
ELM
37
6
0
27 Jun 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning
  Graph
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
32
7
0
25 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Shu Wang
Xing Xie
Xing Xie
ALM
ELM
50
5
0
20 Jun 2024
Data Contamination Can Cross Language Barriers
Data Contamination Can Cross Language Barriers
Feng Yao
Yufan Zhuang
Zihao Sun
Sunan Xu
Animesh Kumar
Jingbo Shang
27
0
0
19 Jun 2024
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
Zhepei Wei
Wei-Lin Chen
Yu Meng
RALM
53
12
0
19 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAG
LRM
ELM
ReLM
52
24
0
04 Jun 2024
PertEval: Unveiling Real Knowledge Capacity of LLMs with
  Knowledge-Invariant Perturbations
PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations
Jiatong Li
Renjun Hu
Kunzhe Huang
Zhuang Yan
Qi Liu
Mengxiao Zhu
Xing Shi
Wei Lin
KELM
36
0
0
30 May 2024
ConStat: Performance-Based Contamination Detection in Large Language
  Models
ConStat: Performance-Based Contamination Detection in Large Language Models
Jasper Dekoninck
Mark Niklas Muller
Martin Vechev
32
5
0
25 May 2024
Benchmarking Educational Program Repair
Benchmarking Educational Program Repair
Charles Koutcheme
Nicola Dainese
Sami Sarsa
Juho Leinonen
Arto Hellas
Paul Denny
AI4Ed
24
5
0
08 May 2024
Conformal Prediction for Natural Language Processing: A Survey
Conformal Prediction for Natural Language Processing: A Survey
Margarida M. Campos
António Farinhas
Chrysoula Zerva
Mário A. T. Figueiredo
André F. T. Martins
AI4CE
38
13
0
03 May 2024
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on
  Negotiation Surrounding
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
Chunkit Chan
Cheng Jiayang
Yauwai Yim
Zheye Deng
Wei Fan
Haoran Li
Xin Liu
Hongming Zhang
Weiqi Wang
Yangqiu Song
LLMAG
27
19
0
21 Apr 2024
The Landscape of Emerging AI Agent Architectures for Reasoning,
  Planning, and Tool Calling: A Survey
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
Tula Masterman
Sandi Besen
Mason Sawtell
Alex Chao
LM&Ro
LLMAG
32
42
0
17 Apr 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable
  Regressor When Given In-Context Examples
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu
Vlad-Andrei Negru
Vasile Suciu
Mihai Surdeanu
23
4
0
11 Apr 2024
LLMs in the Loop: Leveraging Large Language Model Annotations for Active
  Learning in Low-Resource Languages
LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages
Nataliia Kholodna
Sahib Julka
Mohammad Khodadadi
Muhammed Nurullah Gumus
Michael Granitzer
19
9
0
02 Apr 2024
Measuring Taiwanese Mandarin Language Understanding
Measuring Taiwanese Mandarin Language Understanding
Po-Heng Chen
Sijia Cheng
Wei-Lin Chen
Yen-Ting Lin
Yun-Nung Chen
ELM
39
2
0
29 Mar 2024
Concerned with Data Contamination? Assessing Countermeasures in Code
  Language Model
Concerned with Data Contamination? Assessing Countermeasures in Code Language Model
Jialun Cao
Wuqi Zhang
S. Cheung
14
15
0
25 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
29
260
0
12 Mar 2024
TRUCE: Private Benchmarking to Prevent Contamination and Improve
  Comparative Evaluation of LLMs
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs
Tanmay Rajore
Nishanth Chandran
Sunayana Sitaram
Divya Gupta
Rahul Sharma
Kashish Mittal
Manohar Swaminathan
39
13
0
01 Mar 2024
Follow My Instruction and Spill the Beans: Scalable Data Extraction from
  Retrieval-Augmented Generation Systems
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Zhenting Qi
Hanlin Zhang
Eric Xing
Sham Kakade
Hima Lakkaraju
SILM
40
16
0
27 Feb 2024
Watermarking Makes Language Models Radioactive
Watermarking Makes Language Models Radioactive
Tom Sander
Pierre Fernandez
Alain Durmus
Matthijs Douze
Teddy Furon
WaLM
29
11
0
22 Feb 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu
Jindong Wang
Qinlin Zhao
Ruochen Xu
Xing Xie
32
30
0
21 Feb 2024
Training Table Question Answering via SQL Query Decomposition
Training Table Question Answering via SQL Query Decomposition
Raphael Mouravieff
Benjamin Piwowarski
Sylvain Lamprier
ReLM
LMTD
22
0
0
19 Feb 2024
DE-COP: Detecting Copyrighted Content in Language Models Training Data
DE-COP: Detecting Copyrighted Content in Language Models Training Data
André V. Duarte
Xuandong Zhao
Arlindo L. Oliveira
Lei Li
26
32
0
15 Feb 2024
Large Language Models As MOOCs Graders
Large Language Models As MOOCs Graders
Shahriar Golchin
Nikhil Garuda
Christopher Impey
Matthew Wenger
AI4Ed
10
4
0
06 Feb 2024
Evading Data Contamination Detection for Language Models is (too) Easy
Evading Data Contamination Detection for Language Models is (too) Easy
Jasper Dekoninck
Mark Niklas Muller
Maximilian Baader
Marc Fischer
Martin Vechev
82
18
0
05 Feb 2024
12
Next