ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.15071
19
0

Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling

21 April 2025
Louis B. Bradshaw
Simon Colton
ArXivPDFHTML
Abstract

We introduce an extensive new dataset of MIDI files, created by transcribing audio recordings of piano performances into their constituent notes. The data pipeline we use is multi-stage, employing a language model to autonomously crawl and score audio recordings from the internet based on their metadata, followed by a stage of pruning and segmentation using an audio classifier. The resulting dataset contains over one million distinct MIDI files, comprising roughly 100,000 hours of transcribed audio. We provide an in-depth analysis of our techniques, offering statistical insights, and investigate the content by extracting metadata tags, which we also provide. Dataset available atthis https URL.

View on arXiv
@article{bradshaw2025_2504.15071,
  title={ Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling },
  author={ Louis Bradshaw and Simon Colton },
  journal={arXiv preprint arXiv:2504.15071},
  year={ 2025 }
}
Comments on this paper