Cem Mil Podcasts: A Spoken Portuguese Document Corpus
Conference and Labs of the Evaluation Forum (CLEF), 2022
Abstract
This document describes the Portuguese language podcast dataset released by Spotify for academic research purposes. We give an overview of how the data was sampled, some basic statistics over the collection, as well as brief information of distribution over Brazilian and Portuguese dialects.
View on arXivComments on this paper
