ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.07875
  4. Cited By
Libri-Light: A Benchmark for ASR with Limited or No Supervision

Libri-Light: A Benchmark for ASR with Limited or No Supervision

17 December 2019
Jacob Kahn
M. Rivière
Weiyi Zheng
Evgeny Kharitonov
Qiantong Xu
Pierre-Emmanuel Mazaré
Julien Karadayi
Vitaliy Liptchinsky
R. Collobert
Christian Fuegen
Tatiana Likhomanenko
Gabriel Synnaeve
Armand Joulin
Abdel-rahman Mohamed
Emmanuel Dupoux
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "Libri-Light: A Benchmark for ASR with Limited or No Supervision"

50 / 475 papers shown
Title
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
111
42
0
14 Jan 2024
Useful Blunders: Can Automated Speech Recognition Errors Improve
  Downstream Dementia Classification?
Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification?
Changye Li
Weizhe Xu
Trevor Cohen
Serguei V. S. Pakhomov
94
8
0
10 Jan 2024
Hyperbolic Distance-Based Speech Separation
Hyperbolic Distance-Based Speech Separation
Darius Petermann
Minje Kim
71
4
0
07 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
78
7
0
05 Jan 2024
Self-supervised Pretraining for Robust Personalized Voice Activity
  Detection in Adverse Conditions
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
H. S. Bovbjerg
Jesper Jensen
Jan Østergaard
Zheng-Hua Tan
VLM
59
3
0
27 Dec 2023
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
124
94
0
25 Dec 2023
Multimodal Attention Merging for Improved Speech Recognition and Audio
  Event Classification
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Anirudh S. Sundar
Chao-Han Huck Yang
David M. Chan
Shalini Ghosh
Venkatesh Ravichandran
P. S. Nidadavolu
MoMe
94
9
0
22 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
57
1
0
18 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
91
35
0
15 Dec 2023
Generative Context-aware Fine-tuning of Self-supervised Speech Models
Generative Context-aware Fine-tuning of Self-supervised Speech Models
Suwon Shon
Kwangyoun Kim
Prashant Sridhar
Yi-Te Hsu
Shinji Watanabe
Karen Livescu
63
2
0
15 Dec 2023
Fine-Tuned Self-Supervised Speech Representations for Language
  Diarization in Multilingual Code-Switched Speech
Fine-Tuned Self-Supervised Speech Representations for Language Diarization in Multilingual Code-Switched Speech
Geoffrey T. Frost
Emily Morris
Joshua Jansen van Vüren
T. Niesler
60
2
0
15 Dec 2023
Audio-visual fine-tuning of audio-only ASR models
Audio-visual fine-tuning of audio-only ASR models
Avner May
Dmitriy Serdyuk
Ankit Parag Shah
Otavio Braga
Olivier Siohan
70
3
0
14 Dec 2023
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for
  Automatic Speech Recognition
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition
Chengxi Lei
Satwinder Singh
Feng Hou
Xiaoyun Jia
Ruili Wang
60
1
0
13 Dec 2023
Extending Whisper with prompt tuning to target-speaker ASR
Extending Whisper with prompt tuning to target-speaker ASR
Hao Ma
Zhiyuan Peng
Mingjie Shao
Jing Li
Xuelong Li
VLM
81
16
0
13 Dec 2023
Revisiting the Entropy Semiring for Neural Speech Recognition
Revisiting the Entropy Semiring for Neural Speech Recognition
Oscar Chang
DongSeon Hwang
Olivier Siohan
79
3
0
13 Dec 2023
End-to-End Speech-to-Text Translation: A Survey
End-to-End Speech-to-Text Translation: A Survey
Nivedita Sethiya
Chandresh Kumar Maurya
90
8
0
02 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
122
37
0
21 Nov 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech
  Synthesis
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
84
1
0
20 Nov 2023
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker
  Verification Loss for Noise Robustness
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
Vikentii Pankov
Valeria Pronina
Alexander Kuzmin
Maksim Borisov
Nikita Usoltsev
Xingshan Zeng
Alexander Golubkov
Nikolai Ermolenko
Aleksandra Shirshova
Yulia Matveeva
52
5
0
16 Nov 2023
A comparative analysis between Conformer-Transducer, Whisper, and
  wav2vec2 for improving the child speech recognition
A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition
Andrei Barcovschi
Rishabh Jain
Peter Corcoran
47
3
0
07 Nov 2023
Pre-trained Speech Processing Models Contain Human-Like Biases that
  Propagate to Speech Emotion Recognition
Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition
Isaac Slaughter
Craig Greenberg
Reva Schwartz
Aylin Caliskan
93
5
0
29 Oct 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
102
36
0
25 Oct 2023
Acoustic BPE for Speech Generation with Discrete Tokens
Acoustic BPE for Speech Generation with Discrete Tokens
Feiyu Shen
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
66
13
0
23 Oct 2023
Unintended Memorization in Large ASR Models, and How to Mitigate It
Unintended Memorization in Large ASR Models, and How to Mitigate It
Lun Wang
Om Thakkar
Rajiv Mathews
83
6
0
18 Oct 2023
Spatial HuBERT: Self-supervised Spatial Speech Representation Learning
  for a Single Talker from Multi-channel Audio
Spatial HuBERT: Self-supervised Spatial Speech Representation Learning for a Single Talker from Multi-channel Audio
Antoni Dimitriadis
Siqi Pan
V. Sethu
Beena Ahmed
SSL
53
3
0
17 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self
  Transformations
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
50
3
0
14 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech
  segmentation into words
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
Robin Algayres
Pablo Diego-Simon
Benoît Sagot
Emmanuel Dupoux
96
1
0
08 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio
  tokens
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
116
16
0
08 Oct 2023
Acoustic and linguistic representations for speech continuous emotion
  recognition in call center conversations
Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations
Manon Macary
Marie Tahon
Yannick Esteve
Daniel Luzzati
68
3
0
06 Oct 2023
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech
  Model
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model
Kai-Wei Chang
Ming-Hsin Chen
Yun-Ping Lin
Jing Neng Hsu
Paul Kuo-Ming Huang
Chien-yu Huang
Shang-Wen Li
Hung-yi Lee
100
6
0
04 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
111
27
0
04 Oct 2023
Unsupervised Speech Recognition with N-Skipgram and Positional Unigram
  Matching
Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching
Liming Wang
M. Hasegawa-Johnson
Chang D. Yoo
SSL
86
2
0
03 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBMAuLLM
103
128
0
01 Oct 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Xin Wang
Taein Kwon
Wei-Ning Hsu
Yossi Adi
Tu Nguyen
D. Bohus
Emmanuel Dupoux
Neel Joshi
Abdelrahman Mohamed
42
4
0
29 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with
  Large Language Models
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
Eng Siong Chng
96
48
0
27 Sep 2023
Generative Speech Recognition Error Correction with Large Language
  Models and Task-Activating Prompting
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
Chao-Han Huck Yang
Yile Gu
Yi-Chieh Liu
Shalini Ghosh
I. Bulyko
A. Stolcke
KELMLRM
106
52
0
27 Sep 2023
Human Transcription Quality Improvement
Human Transcription Quality Improvement
Jian Gao
Hanbo Sun
Cheng Cao
Zheng Du
137
3
0
24 Sep 2023
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and
  context
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Wei Kang
Xiaoyu Yang
Zengwei Yao
Fangjun Kuang
Yifan Yang
Liyong Guo
Long Lin
Daniel Povey
83
56
0
15 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of
  Speech in ASR Tasks
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Sizhou Chen
Songyang Gao
Sen Fang
26
0
0
14 Sep 2023
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Yongqiang Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
49
3
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLMAuLLM
123
69
0
14 Sep 2023
EnCodecMAE: Leveraging neural codecs for universal audio representation
  learning
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
L. Pepino
Pablo Riera
Luciana Ferrer
76
5
0
14 Sep 2023
Simultaneous Measurement of Multiple Acoustic Attributes Using
  Structured Periodic Test Signals Including Music and Other Sound Materials
Simultaneous Measurement of Multiple Acoustic Attributes Using Structured Periodic Test Signals Including Music and Other Sound Materials
Hideki Kawahara
Kohei Yatabe
Ken-Ichi Sakakibara
M. Mizumachi
Tatsuya Kitamura
21
1
0
06 Sep 2023
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion
Wen-Chin Huang
Tomoki Toda
CVBM
89
5
0
05 Sep 2023
Knowledge Distillation from Non-streaming to Streaming ASR Encoder using
  Auxiliary Non-streaming Layer
Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer
Kyuhong Shim
Jinkyu Lee
Simyoung Chang
Kyuwoong Hwang
73
3
0
31 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
43
10
0
28 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MAAuLLM
180
39
0
24 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Fangyun Wei
Yutong Chen
SLR
69
33
0
21 Aug 2023
TokenSplit: Using Discrete Speech Representations for Direct, Refined,
  and Transcript-Conditioned Speech Separation and Recognition
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Hakan Erdogan
Scott Wisdom
Xuankai Chang
Zalan Borsos
Marco Tagliasacchi
Neil Zeghidour
J. Hershey
69
11
0
21 Aug 2023
Decoding Emotions: A comprehensive Multilingual Study of Speech Models
  for Speech Emotion Recognition
Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Anant Singh
Akshat Gupta
66
5
0
17 Aug 2023
Previous
12345...8910
Next