Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2309.08632
Cited By
Pretraining on the Test Set Is All You Need
13 September 2023
Rylan Schaeffer
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Pretraining on the Test Set Is All You Need"
17 / 17 papers shown
Title
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts
Ellis L Brown
Jihan Yang
Shusheng Yang
Rob Fergus
Saining Xie
VLM
182
4
0
06 Nov 2025
Efficient Prediction of Pass@k Scaling in Large Language Models
Joshua Kazdan
Rylan Schaeffer
Youssef Allouah
Colin Sullivan
Kyssen Yu
Noam Levi
Sanmi Koyejo
OffRL
83
0
0
06 Oct 2025
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Yang Tang
Ruijie Liu
Yifan Wang
Shiyu Li
Xi Chen
78
0
0
30 Sep 2025
Evaluating the Robustness of Chinchilla Compute-Optimal Scaling
Rylan Schaeffer
Noam Levi
Andreas Kirsch
Theo Guenais
Brando Miranda
Elyas Obbad
Sanmi Koyejo
LRM
113
0
0
28 Sep 2025
Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs
Aryan Gulati
Brando Miranda
Eric Chen
Emily Xia
Kai Fronsdal
Bruno Dumont
Elyas Obbad
Sanmi Koyejo
AIMat
ReLM
LRM
274
5
0
05 Aug 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
281
2
0
24 Feb 2025
Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mengqiao Liu
Tevin Wang
Cassandra A. Cohen
Sarah Li
Chenyan Xiong
LRM
223
0
0
21 Feb 2025
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
207
25
0
20 Jun 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
439
55
0
11 Jun 2024
Kotlin ML Pack: Technical Report
Sergey Titov
Mikhail Evtikhiev
Anton Shapkin
Oleg Smirnov
Sergei Boytsov
...
Dariia Karaeva
Maksim Sheptyakov
Mikhail Arkhipov
T. Bryksin
Egor Bogomolov
127
0
0
29 May 2024
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models
Yu Huang
Liang Guo
Wanqian Guo
Zhe Tao
Yang Lv
Zhihao Sun
Dongfang Zhao
ELM
187
3
0
18 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
440
581
0
16 May 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Hongzhi Tan
Kede Ma
Zhihua Wang
...
Yuzhou Cheng
Ge Sun
Guozhou Zheng
Qiang Zhang
H. Chen
292
16
0
10 Apr 2024
Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models
Jiahao Ying
Yixin Cao
Yushi Bai
Qianru Sun
Bo Wang
Wei Tang
Zhaojun Ding
Yizhe Yang
Xuanjing Huang
Shuicheng Yan
KELM
139
12
0
19 Feb 2024
When Large Language Models Meet Vector Databases: A Survey
Zhi Jing
Yongye Su
Yikun Han
Bo Yuan
Haiyun Xu
Chunjiang Liu
Kehai Chen
Min Zhang
356
66
0
30 Jan 2024
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Tian Liang
Zhiwei He
Shu Yang
Wenxuan Wang
Wenxiang Jiao
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
Xing Wang
LLMAG
212
8
0
31 Oct 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELM
AILaw
226
79
0
28 Sep 2023
1