ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.00463
36
10

Open-Source Conversational AI with SpeechBrain 1.0

29 June 2024
Mirco Ravanelli
Titouan Parcollet
Adel Moumen
Sylvain de Langen
Cem Subakan
Peter William VanHarn Plantinga
Yingzhi Wang
Pooneh Mousavi
Luca Della Libera
Artem Ploujnikov
Francesco Paissan
Davide Borra
Salah Zaiem
Zeyu Zhao
Shucong Zhang
Georgios Karakasidis
Sung-Lin Yeh
Pierre Champion
Aku Rouhe
Rudolf A. Braun
Florian Mai
Juan Zuluaga-Gomez
Seyed Mahed Mousavi
A. Nautsch
Xuechen Liu
Sangeet Sagar
J. Duret
Salima Mdhaffar
G. Laperriere
Mickael Rouvier
Renato De Mori
Yannick Esteve
    VLM
ArXivPDFHTML
Abstract

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.

View on arXiv
Comments on this paper