ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.13876
21
33

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

25 September 2023
Yifan Peng
Jinchuan Tian
Brian Yan
Dan Berrebbi
Xuankai Chang
Xinjian Li
Jiatong Shi
Siddhant Arora
William Chen
Roshan S. Sharma
Wangyou Zhang
Yui Sudo
Muhammad Shakeel
Jee-weon Jung
Soumi Maiti
Shinji Watanabe
    VLM
ArXivPDFHTML
Abstract

Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised speech data. It generalizes well to various speech recognition and translation benchmarks even in a zero-shot setup. However, the full pipeline for developing such models (from data collection to training) is not publicly accessible, which makes it difficult for researchers to further improve its performance and address training-related issues such as efficiency, robustness, fairness, and bias. This work presents an Open Whisper-style Speech Model (OWSM), which reproduces Whisper-style training using an open-source toolkit and publicly available data. OWSM even supports more translation directions and can be more efficient to train. We will publicly release all scripts used for data preparation, training, inference, and scoring as well as pre-trained models and training logs to promote open science.

View on arXiv
Comments on this paper