ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.01970
12
24

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

30 September 2022
Leandro von Werra
Lewis Tunstall
A. Thakur
A. Luccioni
Tristan Thrush
Aleksandra Piktus
Felix Marty
Nazneen Rajani
Victor Mustar
Helen Ngo
Omar Sanseviero
Mario vSavsko
Albert Villanova
Quentin Lhoest
Julien Chaumond
Margaret Mitchell
Alexander M. Rush
Thomas Wolf
Douwe Kiela
    ELM
ArXivPDFHTML
Abstract

Evaluation is a key part of machine learning (ML), yet there is a lack of support and tooling to enable its informed and systematic practice. We introduce Evaluate and Evaluation on the Hub --a set of tools to facilitate the evaluation of models and datasets in ML. Evaluate is a library to support best practices for measurements, metrics, and comparisons of data and models. Its goal is to support reproducibility of evaluation, centralize and document the evaluation process, and broaden evaluation to cover more facets of model performance. It includes over 50 efficient canonical implementations for a variety of domains and scenarios, interactive documentation, and the ability to easily share implementations and outcomes. The library is available at https://github.com/huggingface/evaluate. In addition, we introduce Evaluation on the Hub, a platform that enables the large-scale evaluation of over 75,000 models and 11,000 datasets on the Hugging Face Hub, for free, at the click of a button. Evaluation on the Hub is available at https://huggingface.co/autoevaluate.

View on arXiv
Comments on this paper