ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.06059
  4. Cited By
Investigating Data Contamination for Pre-training Language Models

Investigating Data Contamination for Pre-training Language Models

11 January 2024
Minhao Jiang
Ken Ziyu Liu
Ming Zhong
Rylan Schaeffer
Siru Ouyang
Jiawei Han
Sanmi Koyejo
ArXivPDFHTML

Papers citing "Investigating Data Contamination for Pre-training Language Models"

47 / 47 papers shown
Title
Large Language Models as Span Annotators
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
30
0
0
11 Apr 2025
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann
Jie-jin Yang
LRM
46
0
0
02 Apr 2025
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
Ken Ziyu Liu
Christopher A. Choquette-Choo
Matthew Jagielski
Peter Kairouz
Sanmi Koyejo
Percy Liang
Nicolas Papernot
51
0
0
21 Mar 2025
Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems
Ben Bucknall
Robert F. Trager
Michael A. Osborne
80
0
0
03 Mar 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
69
0
0
24 Feb 2025
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
Eva Sánchez Salido
Julio Gonzalo
Guillermo Marco
ELM
58
2
0
18 Feb 2025
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
J. Han
X. Zhang
Wei Wang
Huan Liu
65
11
0
03 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
70
53
0
28 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
88
11
0
31 Dec 2024
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
Eunsu Kim
Juyoung Suk
Seungone Kim
Niklas Muennighoff
Dongkwan Kim
Alice H. Oh
ELM
78
1
0
31 Dec 2024
AntiLeak-Bench: Preventing Data Contamination by Automatically
  Constructing Benchmarks with Updated Real-World Knowledge
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
Xiaobao Wu
Liangming Pan
Yuxi Xie
Ruiwen Zhou
Shuai Zhao
Yubo Ma
Mingzhe Du
Rui Mao
Anh Tuan Luu
William Yang Wang
93
9
0
18 Dec 2024
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and
  Establishing Best Practices
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Anka Reuel
Amelia F. Hardy
Chandler Smith
Max Lamparth
Malcolm Hardy
Mykel J. Kochenderfer
ELM
62
16
0
20 Nov 2024
CODECLEANER: Elevating Standards with A Robust Data Contamination Mitigation Toolkit
Jialun Cao
Songqiang Chen
Wuqi Zhang
Hau Ching Lo
S. Cheung
31
0
0
16 Nov 2024
Evaluation data contamination in LLMs: how do we measure it and (when)
  does it matter?
Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?
Aaditya K. Singh
Muhammed Yusuf Kocyigit
Andrew Poulton
David Esiobu
Maria Lomeli
Gergely Szilvasy
Dieuwke Hupkes
25
7
0
06 Nov 2024
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of
  Large Language Models
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models
Mohammad Beigi
Sijia Wang
Ying Shen
Zihao Lin
Adithya Kulkarni
...
Ming Jin
Jin-Hee Cho
Dawei Zhou
Chang-Tien Lu
Lifu Huang
21
1
0
26 Oct 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
49
1
0
26 Oct 2024
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Jacob Haimes
Cenny Wenner
Kunvar Thaman
Vassil Tashev
Clement Neo
Esben Kran
Jason Schreiber
22
5
0
11 Oct 2024
Language model developers should report train-test overlap
Language model developers should report train-test overlap
Andy K. Zhang
Kevin Klyman
Yifan Mai
Yoav Levine
Yian Zhang
Rishi Bommasani
Percy Liang
VLM
ELM
24
8
0
10 Oct 2024
How Much Can We Forget about Data Contamination?
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
43
1
0
04 Oct 2024
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Ezra Karger
Houtan Bastani
Chen Yueh-Han
Zachary Jacobs
Danny Halawi
Fred Zhang
P. Tetlock
33
6
0
30 Sep 2024
TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific
  Energy-Based Models
TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models
Andrei Margeloiu
Xiangjian Jiang
Nikola Simidjievski
M. Jamnik
22
5
0
24 Sep 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
39
0
0
19 Sep 2024
Assessing Contamination in Large Language Models: Introducing the
  LogProber method
Assessing Contamination in Large Language Models: Introducing the LogProber method
Nicolas Yax
Pierre-Yves Oudeyer
Stefano Palminteri
24
3
0
26 Aug 2024
Focused Large Language Models are Stable Many-Shot Learners
Focused Large Language Models are Stable Many-Shot Learners
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Xinglin Wang
Y. Zhang
Chuyi Tan
Boyuan Pan
Heda Wang
Yao Hu
Kan Li
56
5
0
26 Aug 2024
StructEval: Deepen and Broaden Large Language Model Assessment via
  Structured Evaluation
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation
Boxi Cao
Mengjie Ren
Hongyu Lin
Xianpei Han
Feng Zhang
Junfeng Zhan
Le Sun
ELM
26
3
0
06 Aug 2024
Questionable practices in machine learning
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
42
2
0
17 Jul 2024
VarBench: Robust Language Model Benchmarking Through Dynamic Variable
  Perturbation
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation
Kun Qian
Shunji Wan
Claudia Tang
Youzhi Wang
Xuanming Zhang
Maximillian Chen
Zhou Yu
AAML
35
8
0
25 Jun 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning
  Graph
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
32
7
0
25 Jun 2024
Unveiling the Spectrum of Data Contamination in Language Models: A
  Survey from Detection to Remediation
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
27
13
0
20 Jun 2024
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large
  Language Model Evaluation
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation
Qin Zhu
Qingyuan Cheng
Runyu Peng
Xiaonan Li
Tengxiao Liu
Ru Peng
Xipeng Qiu
Xuanjing Huang
31
6
0
20 Jun 2024
Data Contamination Can Cross Language Barriers
Data Contamination Can Cross Language Barriers
Feng Yao
Yufan Zhuang
Zihao Sun
Sunan Xu
Animesh Kumar
Jingbo Shang
30
6
0
19 Jun 2024
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Jian-Qiao Zhu
Haijiang Yan
Thomas L. Griffiths
80
2
0
29 May 2024
On Fairness of Low-Rank Adaptation of Large Models
On Fairness of Low-Rank Adaptation of Large Models
Zhoujie Ding
Ken Ziyu Liu
Pura Peetathawatchai
Berivan Isik
Sanmi Koyejo
38
4
0
27 May 2024
LMD3: Language Model Data Density Dependence
LMD3: Language Model Data Density Dependence
John Kirchenbauer
Garrett Honke
Gowthami Somepalli
Jonas Geiping
Daphne Ippolito
Katherine Lee
Tom Goldstein
David Andre
27
6
0
10 May 2024
Benchmarking Benchmark Leakage in Large Language Models
Benchmarking Benchmark Leakage in Large Language Models
Ruijie Xu
Zengzhi Wang
Run-Ze Fan
Pengfei Liu
53
42
0
29 Apr 2024
Elephants Never Forget: Memorization and Learning of Tabular Data in
  Large Language Models
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Sebastian Bordt
Harsha Nori
Vanessa Rodrigues
Besmira Nushi
Rich Caruana
36
12
0
09 Apr 2024
Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech
Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech
Ghadi Alyahya
Abeer Aldayel
38
2
0
18 Mar 2024
Quantifying Contamination in Evaluating Code Generation Capabilities of
  Language Models
Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
Martin Riddell
Ansong Ni
Arman Cohan
ELM
29
28
0
06 Mar 2024
Follow My Instruction and Spill the Beans: Scalable Data Extraction from
  Retrieval-Augmented Generation Systems
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Zhenting Qi
Hanlin Zhang
Eric Xing
Sham Kakade
Hima Lakkaraju
SILM
40
16
0
27 Feb 2024
If in a Crowdsourced Data Annotation Pipeline, a GPT-4
If in a Crowdsourced Data Annotation Pipeline, a GPT-4
Zeyu He
Huang Chieh-Yang
C. C. Ding
Shaurya Rohatgi
Ting-Hao 'Kenneth' Huang
20
30
0
26 Feb 2024
Investigating the Impact of Data Contamination of Large Language Models
  in Text-to-SQL Translation
Investigating the Impact of Data Contamination of Large Language Models in Text-to-SQL Translation
Federico Ranaldi
Elena Sofia Ruzzetti
Dario Onorati
Leonardo Ranaldi
Cristina Giannone
Andrea Favalli
Raniero Romagnoli
Fabio Massimo Zanzotto
57
17
0
12 Feb 2024
On Catastrophic Inheritance of Large Foundation Models
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Jindong Wang
AI4CE
48
12
0
02 Feb 2024
Instructional Fingerprinting of Large Language Models
Instructional Fingerprinting of Large Language Models
Jiashu Xu
Fei Wang
Mingyu Derek Ma
Pang Wei Koh
Chaowei Xiao
Muhao Chen
WaLM
22
29
0
21 Jan 2024
Investigating Data Contamination in Modern Benchmarks for Large Language
  Models
Investigating Data Contamination in Modern Benchmarks for Large Language Models
Chunyuan Deng
Yilun Zhao
Xiangru Tang
Mark B. Gerstein
Arman Cohan
AAML
ELM
19
50
0
16 Nov 2023
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in
  Large Language Models
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
Shahriar Golchin
Mihai Surdeanu
16
24
0
10 Nov 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,798
0
14 Dec 2020
1