The Ladder: A Reliable Leaderboard for Machine Learning Competitions

16 February 2015

Avrim Blum

Moritz Hardt

ArXiv (abs)PDF HTML

Papers citing "The Ladder: A Reliable Leaderboard for Machine Learning Competitions"

50 / 73 papers shown

The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models

Timo Freiesleben

Sebastian Zezulka

159

27 Oct 2025

Improving AGI Evaluation: A Data Science Perspective

John Hawkins

ELM

139

02 Oct 2025

Benchmark-Driven Selection of AI: Evidence from DeepSeek-R1

Petr Spelda

Vit Stritecky

ELM LRM

113

13 Aug 2025

The Sample Complexity of Parameter-Free Stochastic Convex Optimization

267

12 Jun 2025

Does Prior Data Matter? Exploring Joint Training in the Context of Few-Shot Class-Incremental Learning

474

13 Mar 2025

Benchmark Data Repositories for Better BenchmarkingNeural Information Processing Systems (NeurIPS), 2024

305

31 Oct 2024

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesInternational Conference on Learning Representations (ICLR), 2024

Qian Liu

328

09 Oct 2024

Responsible AI in Open Ecosystems: Reconciling Innovation with Risk Assessment and Disclosure

Mahasweta Chakraborti

Bert Joseph Prestoza

Nicholas Vincent

Seth Frey

331

27 Sep 2024

Inherent Trade-Offs between Diversity and Stability in Multi-Task BenchmarksInternational Conference on Machine Learning (ICML), 2024

Guanhua Zhang

Moritz Hardt

356

02 May 2024

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

Christian Schroeder de Witt

278

29 Feb 2024

Diversified Ensembling: An Experiment in Crowdsourced Machine Learning

270

16 Feb 2024

Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Florian E. Dorner

Moritz Hardt

NoLa

276

03 Feb 2024

Challenge design roadmap

266

15 Jan 2024

Practical, Private Assurance of the Value of CollaborationProceedings on Privacy Enhancing Technologies (PoPETs), 2023

330

04 Oct 2023

Computational modeling of semantic changeConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

Nina Tahmasebi

Haim Dubossarsky

368

13 Apr 2023

Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text

Yasmen Wahba

N. Madhavji

John Steinbacher

258

31 Mar 2023

Accounting for multiplicity in machine learning benchmark performance

Kajsa Møllersen

Einar J. Holsbø

239

10 Mar 2023

Data-Centric Governance

Sean McGregor

Jesse Hostetler

170

14 Feb 2023

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval BenchmarksFindings (Findings), 2022

282

10 Oct 2022

Measuring and signing fairness as performance under multiple stakeholder distributions

218

20 Jul 2022

Sequential Nature of Recommender Systems Disrupts the Evaluation ProcessInternational Workshop on Algorithmic Bias in Search and Recommendation (ABSR), 2022

Ali Shirali

241

26 May 2022

SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning

Harsh Chaudhari

Matthew Jagielski

Alina Oprea

304

20 May 2022

Making Progress Based on False DiscoveriesInformation Technology Convergence and Services (ITCS), 2022

Roi Livni

231

19 Apr 2022

Sequential algorithmic modification with test data reuseConference on Uncertainty in Artificial Intelligence (UAI), 2022

169

21 Mar 2022

An Uncommon Task: Participatory Design in Legal AI

Fernando A. Delgado

Solon Barocas

K. Levy

152

08 Mar 2022

An Algorithmic Framework for Bias BountiesConference on Fairness, Accountability and Transparency (FAccT), 2022

576

25 Jan 2022

The Benchmark Lottery

419

116

14 Jul 2021

How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable EvaluationAAAI Conference on Artificial Intelligence (AAAI), 2021

Swaroop Mishra

Anjana Arunkumar

237

10 Jun 2021

Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation BenchmarkingNeural Information Processing Systems (NeurIPS), 2021

Robin Jia

Douwe Kiela

279

21 May 2021

Label Inference Attacks from Log-loss ScoresInternational Conference on Machine Learning (ICML), 2021

140

18 May 2021

RATT: Leveraging Unlabeled Data to Guarantee GeneralizationInternational Conference on Machine Learning (ICML), 2021

Saurabh Garg

Sivaraman Balakrishnan

J. Zico Kolter

Zachary Chase Lipton

337

01 May 2021

Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data

Sanjeev Arora

Yi Zhang

MLAU

228

25 Feb 2021

A Data Quality-Driven View of MLOpsIEEE Data Engineering Bulletin (DEB), 2021

Nezihe Merve Gürel

207

15 Feb 2021

Utility is in the Eye of the User: A Critique of NLP Leaderboards

Kawin Ethayarajh

Dan Jurafsky

ELM

466

29 Sep 2020

On Primes, Log-Loss Scores and (No) Privacy

135

17 Sep 2020

DeepNNK: Explaining deep models and their generalization using polytope interpolation

Sarath Shekkizhar

Antonio Ortega

152

20 Jul 2020

Identifying Statistical Bias in Dataset Replication

242

19 May 2020

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

284

155

19 May 2020

The Effect of Natural Distribution Shift on Question Answering ModelsInternational Conference on Machine Learning (ICML), 2020

Benjamin Recht

386

157

29 Apr 2020

Approval policies for modifications to Machine Learning-Based Software as a Medical Device: A study of bio-creep

Jean Feng

S. Emerson

N. Simon

271

28 Dec 2019

Adaptive Statistical Learning with Bayesian Differential Privacy

Jun Zhao

236

02 Nov 2019

A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical AnalysisInternational Conference on Data Science and Advanced Analytics (DSAA), 2019

L. Stefani

E. Upfal

255

04 Oct 2019

Optimal multiclass overfitting by sequence reconstruction from Hamming queriesInternational Conference on Algorithmic Learning Theory (ALT), 2019

Jayadev Acharya

A. Suresh

182

08 Aug 2019

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture DistributionsNeural Information Processing Systems (NeurIPS), 2019

Matthew Faw

Rajat Sen

Karthikeyan Shanmugam

Constantine Caramanis

Sanjay Shakkottai

387

23 Jul 2019

Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter

F. Hubis

Wentao Wu

Ce Zhang

247

01 Jun 2019

Model Similarity Mitigates Test Set OveruseNeural Information Processing Systems (NeurIPS), 2019

Benjamin Recht

279

29 May 2019

The advantages of multiple classes for reducing overfitting from test set reuseInternational Conference on Machine Learning (ICML), 2019

Vitaly Feldman

Roy Frostig

Moritz Hardt

169

24 May 2019

Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical TreatmentUSENIX workshop on Tackling computer systems problems with machine learning techniques (SysML), 2019

147

01 Mar 2019

Do ImageNet Classifiers Generalize to ImageNet?International Conference on Machine Learning (ICML), 2019

Benjamin Recht

790

2,103

13 Feb 2019

Natural Analysts in Adaptive Data AnalysisInternational Conference on Machine Learning (ICML), 2019

Tijana Zrnic

Moritz Hardt

329

30 Jan 2019