ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.04585
  4. Cited By
The Ladder: A Reliable Leaderboard for Machine Learning Competitions

The Ladder: A Reliable Leaderboard for Machine Learning Competitions

16 February 2015
Avrim Blum
Moritz Hardt
ArXiv (abs)PDFHTML

Papers citing "The Ladder: A Reliable Leaderboard for Machine Learning Competitions"

50 / 73 papers shown
The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models
The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models
Timo Freiesleben
Sebastian Zezulka
151
5
0
27 Oct 2025
Improving AGI Evaluation: A Data Science Perspective
Improving AGI Evaluation: A Data Science Perspective
John Hawkins
ELM
138
0
0
02 Oct 2025
Benchmark-Driven Selection of AI: Evidence from DeepSeek-R1
Benchmark-Driven Selection of AI: Evidence from DeepSeek-R1
Petr Spelda
Vit Stritecky
ELMLRM
107
0
0
13 Aug 2025
The Sample Complexity of Parameter-Free Stochastic Convex Optimization
The Sample Complexity of Parameter-Free Stochastic Convex Optimization
Jared Lawrence
Ari Kalinsky
Hannah Bradfield
Y. Carmon
Oliver Hinder
257
0
0
12 Jun 2025
Does Prior Data Matter? Exploring Joint Training in the Context of Few-Shot Class-Incremental Learning
Does Prior Data Matter? Exploring Joint Training in the Context of Few-Shot Class-Incremental Learning
Shiwon Kim
Dongjun Hwang
Sungwon Woo
Rita Singh
CLL
467
0
0
13 Mar 2025
Benchmark Data Repositories for Better Benchmarking
Benchmark Data Repositories for Better BenchmarkingNeural Information Processing Systems (NeurIPS), 2024
Rachel Longjohn
Markelle Kelly
Sameer Singh
Padhraic Smyth
286
15
0
31 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesInternational Conference on Learning Representations (ICLR), 2024
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
323
28
0
09 Oct 2024
Responsible AI in Open Ecosystems: Reconciling Innovation with Risk
  Assessment and Disclosure
Responsible AI in Open Ecosystems: Reconciling Innovation with Risk Assessment and Disclosure
Mahasweta Chakraborti
Bert Joseph Prestoza
Nicholas Vincent
Seth Frey
324
1
0
27 Sep 2024
Inherent Trade-Offs between Diversity and Stability in Multi-Task
  Benchmarks
Inherent Trade-Offs between Diversity and Stability in Multi-Task BenchmarksInternational Conference on Machine Learning (ICML), 2024
Guanhua Zhang
Moritz Hardt
351
21
0
02 May 2024
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid
  Progress
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Christian Schroeder de Witt
Vishaal Udandarao
Juil Sock
Matthias Bethge
Adel Bibi
Samuel Albanie
264
3
0
29 Feb 2024
Diversified Ensembling: An Experiment in Crowdsourced Machine Learning
Diversified Ensembling: An Experiment in Crowdsourced Machine Learning
Ira Globus-Harris
Declan Harrison
Michael Kearns
Pietro Perona
Aaron Roth
FedML
263
2
0
16 Feb 2024
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
Florian E. Dorner
Moritz Hardt
NoLa
260
9
0
03 Feb 2024
Challenge design roadmap
Challenge design roadmap
Hugo Jair Escalante
Isabelle M Guyon
Addison Howard
Walter Reade
Sébastien Treguer
AI4TS
251
0
0
15 Jan 2024
Practical, Private Assurance of the Value of Collaboration
Practical, Private Assurance of the Value of CollaborationProceedings on Privacy Enhancing Technologies (PoPETs), 2023
Hassan Jameel Asghar
Zhigang Lu
Zhongrui Zhao
Dali Kaafar
FedML
317
0
0
04 Oct 2023
Computational modeling of semantic change
Computational modeling of semantic changeConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Nina Tahmasebi
Haim Dubossarsky
362
8
0
13 Apr 2023
Attention is Not Always What You Need: Towards Efficient Classification
  of Domain-Specific Text
Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text
Yasmen Wahba
N. Madhavji
John Steinbacher
258
4
0
31 Mar 2023
Accounting for multiplicity in machine learning benchmark performance
Accounting for multiplicity in machine learning benchmark performance
Kajsa Møllersen
Einar J. Holsbø
226
3
0
10 Mar 2023
Data-Centric Governance
Data-Centric Governance
Sean McGregor
Jesse Hostetler
167
2
0
14 Feb 2023
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video
  Retrieval Benchmarks
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval BenchmarksFindings (Findings), 2022
Pedro Rodriguez
Mahmoud Azab
Becka Silvert
Renato Sanchez
Linzy Labson
Hardik Shah
Seungwhan Moon
270
2
0
10 Oct 2022
Measuring and signing fairness as performance under multiple stakeholder
  distributions
Measuring and signing fairness as performance under multiple stakeholder distributions
David Lopez-Paz
Diane Bouchacourt
Levent Sagun
Nicolas Usunier
218
8
0
20 Jul 2022
Sequential Nature of Recommender Systems Disrupts the Evaluation Process
Sequential Nature of Recommender Systems Disrupts the Evaluation ProcessInternational Workshop on Algorithmic Bias in Search and Recommendation (ABSR), 2022
Ali Shirali
235
4
0
26 May 2022
SafeNet: The Unreasonable Effectiveness of Ensembles in Private
  Collaborative Learning
SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning
Harsh Chaudhari
Matthew Jagielski
Alina Oprea
286
7
0
20 May 2022
Making Progress Based on False Discoveries
Making Progress Based on False DiscoveriesInformation Technology Convergence and Services (ITCS), 2022
Roi Livni
225
0
0
19 Apr 2022
Sequential algorithmic modification with test data reuse
Sequential algorithmic modification with test data reuseConference on Uncertainty in Artificial Intelligence (UAI), 2022
Jean Feng
Gene Pennello
N. Petrick
B. Sahiner
Romain Pirracchio
Alexej Gossmann
162
5
0
21 Mar 2022
An Uncommon Task: Participatory Design in Legal AI
An Uncommon Task: Participatory Design in Legal AI
Fernando A. Delgado
Solon Barocas
K. Levy
140
47
0
08 Mar 2022
An Algorithmic Framework for Bias Bounties
An Algorithmic Framework for Bias BountiesConference on Fairness, Accountability and Transparency (FAccT), 2022
Ira Globus-Harris
Michael Kearns
Aaron Roth
FedML
572
32
0
25 Jan 2022
The Benchmark Lottery
The Benchmark Lottery
Mostafa Dehghani
Yi Tay
A. Gritsenko
Zhe Zhao
N. Houlsby
Fernando Diaz
Donald Metzler
Oriol Vinyals
419
115
0
14 Jul 2021
How Robust are Model Rankings: A Leaderboard Customization Approach for
  Equitable Evaluation
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable EvaluationAAAI Conference on Artificial Intelligence (AAAI), 2021
Swaroop Mishra
Anjana Arunkumar
230
27
0
10 Jun 2021
Dynaboard: An Evaluation-As-A-Service Platform for Holistic
  Next-Generation Benchmarking
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation BenchmarkingNeural Information Processing Systems (NeurIPS), 2021
Zhiyi Ma
Kawin Ethayarajh
Tristan Thrush
Somya Jain
Ledell Yu Wu
Robin Jia
Christopher Potts
Adina Williams
Douwe Kiela
ELM
276
65
0
21 May 2021
Label Inference Attacks from Log-loss Scores
Label Inference Attacks from Log-loss ScoresInternational Conference on Machine Learning (ICML), 2021
Abhinav Aggarwal
S. Kasiviswanathan
Zekun Xu
Oluwaseyi Feyisetan
Nathanael Teissier
126
12
0
18 May 2021
RATT: Leveraging Unlabeled Data to Guarantee Generalization
RATT: Leveraging Unlabeled Data to Guarantee GeneralizationInternational Conference on Machine Learning (ICML), 2021
Saurabh Garg
Sivaraman Balakrishnan
J. Zico Kolter
Zachary Chase Lipton
318
30
0
01 May 2021
Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data
Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data
Sanjeev Arora
Yi Zhang
MLAU
228
11
0
25 Feb 2021
A Data Quality-Driven View of MLOps
A Data Quality-Driven View of MLOpsIEEE Data Engineering Bulletin (DEB), 2021
Cédric Renggli
Luka Rimanic
Nezihe Merve Gürel
Bojan Karlavs
Wentao Wu
Ce Zhang
AI4TS
207
71
0
15 Feb 2021
Utility is in the Eye of the User: A Critique of NLP Leaderboards
Utility is in the Eye of the User: A Critique of NLP Leaderboards
Kawin Ethayarajh
Dan Jurafsky
ELM
447
60
0
29 Sep 2020
On Primes, Log-Loss Scores and (No) Privacy
On Primes, Log-Loss Scores and (No) Privacy
Abhinav Aggarwal
Zekun Xu
Oluwaseyi Feyisetan
Nathanael Teissier
MIACV
135
0
0
17 Sep 2020
DeepNNK: Explaining deep models and their generalization using polytope
  interpolation
DeepNNK: Explaining deep models and their generalization using polytope interpolation
Sarath Shekkizhar
Antonio Ortega
145
8
0
20 Jul 2020
Identifying Statistical Bias in Dataset Replication
Identifying Statistical Bias in Dataset Replication
Logan Engstrom
Andrew Ilyas
Shibani Santurkar
Dimitris Tsipras
Jacob Steinhardt
Aleksander Madry
220
55
0
19 May 2020
On the Value of Out-of-Distribution Testing: An Example of Goodhart's
  Law
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney
Kushal Kafle
Robik Shrestha
Ehsan Abbasnejad
Christopher Kanan
Anton Van Den Hengel
OODDOOD
271
155
0
19 May 2020
The Effect of Natural Distribution Shift on Question Answering Models
The Effect of Natural Distribution Shift on Question Answering ModelsInternational Conference on Machine Learning (ICML), 2020
John Miller
K. Krauth
Benjamin Recht
Ludwig Schmidt
OOD
372
157
0
29 Apr 2020
Approval policies for modifications to Machine Learning-Based Software
  as a Medical Device: A study of bio-creep
Approval policies for modifications to Machine Learning-Based Software as a Medical Device: A study of bio-creep
Jean Feng
S. Emerson
N. Simon
255
22
0
28 Dec 2019
Adaptive Statistical Learning with Bayesian Differential Privacy
Adaptive Statistical Learning with Bayesian Differential Privacy
Jun Zhao
229
1
0
02 Nov 2019
A Rademacher Complexity Based Method fo rControlling Power and
  Confidence Level in Adaptive Statistical Analysis
A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical AnalysisInternational Conference on Data Science and Advanced Analytics (DSAA), 2019
L. Stefani
E. Upfal
243
9
0
04 Oct 2019
Optimal multiclass overfitting by sequence reconstruction from Hamming
  queries
Optimal multiclass overfitting by sequence reconstruction from Hamming queriesInternational Conference on Algorithmic Learning Theory (ALT), 2019
Jayadev Acharya
A. Suresh
182
4
0
08 Aug 2019
Mix and Match: An Optimistic Tree-Search Approach for Learning Models
  from Mixture Distributions
Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture DistributionsNeural Information Processing Systems (NeurIPS), 2019
Matthew Faw
Rajat Sen
Karthikeyan Shanmugam
Constantine Caramanis
Sanjay Shakkottai
381
3
0
23 Jul 2019
Quantitative Overfitting Management for Human-in-the-loop ML Application
  Development with ease.ml/meter
Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter
F. Hubis
Wentao Wu
Ce Zhang
237
6
0
01 Jun 2019
Model Similarity Mitigates Test Set Overuse
Model Similarity Mitigates Test Set OveruseNeural Information Processing Systems (NeurIPS), 2019
Horia Mania
John Miller
Ludwig Schmidt
Moritz Hardt
Benjamin Recht
273
56
0
29 May 2019
The advantages of multiple classes for reducing overfitting from test
  set reuse
The advantages of multiple classes for reducing overfitting from test set reuseInternational Conference on Machine Learning (ICML), 2019
Vitaly Feldman
Roy Frostig
Moritz Hardt
169
33
0
24 May 2019
Continuous Integration of Machine Learning Models with ease.ml/ci:
  Towards a Rigorous Yet Practical Treatment
Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical TreatmentUSENIX workshop on Tackling computer systems problems with machine learning techniques (SysML), 2019
Cédric Renggli
Bojan Karlas
Bolin Ding
Feng Liu
Kevin Schawinski
Wentao Wu
Ce Zhang
VLM
141
49
0
01 Mar 2019
Do ImageNet Classifiers Generalize to ImageNet?
Do ImageNet Classifiers Generalize to ImageNet?International Conference on Machine Learning (ICML), 2019
Benjamin Recht
Rebecca Roelofs
Ludwig Schmidt
Vaishaal Shankar
OODSSegVLM
737
2,096
0
13 Feb 2019
Natural Analysts in Adaptive Data Analysis
Natural Analysts in Adaptive Data AnalysisInternational Conference on Machine Learning (ICML), 2019
Tijana Zrnic
Moritz Hardt
326
17
0
30 Jan 2019
12
Next
Page 1 of 2