Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.08493
Cited By
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
16 August 2023
Shahriar Golchin
Mihai Surdeanu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Time Travel in LLMs: Tracing Data Contamination in Large Language Models"
18 / 68 papers shown
Title
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
75
0
25 Jan 2024
Orion-14B: Open-source Multilingual Large Language Models
Du Chen
Yi Huang
Xiaopu Li
Yongqiang Li
Yongqiang Liu
Haihui Pan
Leichao Xu
Dacheng Zhang
Zhipeng Zhang
Kun Han
16
4
0
20 Jan 2024
Investigating Data Contamination for Pre-training Language Models
Minhao Jiang
Ken Ziyu Liu
Ming Zhong
Rylan Schaeffer
Siru Ouyang
Jiawei Han
Sanmi Koyejo
17
62
0
11 Jan 2024
How should the advent of large language models affect the practice of science?
Marcel Binz
Stephan Alaniz
Adina Roskies
B. Aczel
Carl T. Bergstrom
...
Emily M. Bender
M. Marelli
Matthew M. Botvinick
Zeynep Akata
Eric Schulz
18
8
0
05 Dec 2023
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
Shahriar Golchin
Mihai Surdeanu
16
24
0
10 Nov 2023
The Behavior of Large Language Models When Prompted to Generate Code Explanations
Priti Oli
Rabin Banjade
Jeevan Chapagain
Vasile Rus
LRM
16
4
0
02 Nov 2023
Proving Test Set Contamination in Black Box Language Models
Yonatan Oren
Nicole Meister
Niladri Chatterji
Faisal Ladhak
Tatsunori B. Hashimoto
HILM
14
128
0
26 Oct 2023
Detecting Pretraining Data from Large Language Models
Weijia Shi
Anirudh Ajith
Mengzhou Xia
Yangsibo Huang
Daogao Liu
Terra Blevins
Danqi Chen
Luke Zettlemoyer
MIALM
10
161
0
25 Oct 2023
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
Shitong Duan
Xiaoyuan Yi
Peng Zhang
T. Lu
Xing Xie
Ning Gu
11
9
0
17 Oct 2023
Data Contamination Through the Lens of Time
Manley Roberts
Himanshu Thakur
Christine Herlihy
Colin White
Samuel Dooley
84
30
0
16 Oct 2023
Factuality Challenges in the Era of Large Language Models
Isabelle Augenstein
Timothy Baldwin
Meeyoung Cha
Tanmoy Chakraborty
Giovanni Luca Ciampaglia
...
Rubén Míguez
Preslav Nakov
Dietram A. Scheufele
Shivam Sharma
Giovanni Zagni
HILM
27
31
0
08 Oct 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELM
LRM
14
42
0
29 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
20
34
0
23 Sep 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
209
2,232
0
22 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,798
0
14 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Previous
1
2