Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.07875
Cited By
Libri-Light: A Benchmark for ASR with Limited or No Supervision
17 December 2019
Jacob Kahn
M. Rivière
Weiyi Zheng
Evgeny Kharitonov
Qiantong Xu
Pierre-Emmanuel Mazaré
Julien Karadayi
Vitaliy Liptchinsky
R. Collobert
Christian Fuegen
Tatiana Likhomanenko
Gabriel Synnaeve
Armand Joulin
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Libri-Light: A Benchmark for ASR with Limited or No Supervision"
50 / 475 papers shown
Title
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
88
4
0
09 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade
Puyuan Peng
David Harwath
123
8
0
05 Oct 2024
NTU-NPU System for Voice Privacy 2024 Challenge
Nikita Kuzmin
Hieu-Thi Luong
Jixun Yao
Lei Xie
Kong Aik Lee
Eng Siong Chng
108
1
0
03 Oct 2024
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
72
6
0
01 Oct 2024
Zero-Shot Text-to-Speech from Continuous Text Streams
Trung D. Q. Dang
David Aponte
Dung Tran
Tianyi Chen
K. Koishida
AuLLM
VLM
70
5
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
189
25
0
01 Oct 2024
AfriHuBERT: A self-supervised speech representation model for African languages
Jesujoba Oluwadara Alabi
Xuechen Liu
Dietrich Klakow
Junichi Yamagishi
VLM
69
3
0
30 Sep 2024
Probing mental health information in speech foundation models
Marc de Gennes
Adrien Lesage
Martin Denais
Xuan-Nga Cao
Simon Chang
Pierre Van Remoortere
Cyrille Dakhlia
Rachid Riad
51
0
0
27 Sep 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
61
1
0
25 Sep 2024
Speech Recognition Rescoring with Large Speech-Text Foundation Models
Prashanth Gurunath Shivakumar
J. Kolehmainen
Aditya Gourav
Yi Gu
Ankur Gandhe
Ariya Rastrow
I. Bulyko
AuLLM
78
0
0
25 Sep 2024
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Pin-Jui Ku
Alexander H. Liu
Roman Korostik
Sung-Feng Huang
Szu-Wei Fu
Ante Jukić
77
4
0
24 Sep 2024
Semi-supervised Learning For Robust Speech Evaluation
Huayun Zhang
Jeremy H. M. Wong
Geyu Lin
Nancy F. Chen
61
0
0
23 Sep 2024
Training Large ASR Encoders with Differential Privacy
Geeticka Chauhan
Steve Chien
Om Thakkar
Abhradeep Thakurta
Arun Narayanan
95
1
0
21 Sep 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Yuan Feng
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
82
8
0
18 Sep 2024
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Yufeng Yang
Desh Raj
Ju Lin
Niko Moritz
Junteng Jia
...
Egor Lakomkin
Yiteng Huang
Jacob Donley
Jay Mahadeokar
Ozlem Kalinli
82
2
0
17 Sep 2024
SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Peizhuo Liu
Li Wang
Renqiang He
Haorui He
Lei Wang
Huadi Zheng
Jie Shi
Tong Xiao
Zhizheng Wu
101
1
0
17 Sep 2024
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
68
3
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
105
5
0
16 Sep 2024
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Jiaqi Li
Dongmei Wang
Xiaofei Wang
Yao Qian
Long Zhou
...
Junkun Chen
Sheng Zhao
Jinyu Li
Zhizheng Wu
Michael Zeng
AuLLM
81
3
0
06 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
114
61
0
01 Sep 2024
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
Lun Wang
54
0
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
140
45
0
29 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
60
1
0
23 Aug 2024
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Kai-Wei Chang
Haibin Wu
Yu-Kai Wang
Yuan-Kuei Wu
Hua Shen
Wei-Cheng Tseng
Iu-thing Kang
Shang-Wen Li
Hung-yi Lee
88
3
0
23 Aug 2024
Parameter-Efficient Transfer Learning under Federated Learning for Automatic Speech Recognition
Xuan Kan
Yonghui Xiao
Tien-Ju Yang
Nanxin Chen
Rajiv Mathews
FedML
52
2
0
19 Aug 2024
CMU's IWSLT 2024 Simultaneous Speech Translation System
Xi Xu
Siqi Ouyang
Brian Yan
Patrick Fernandes
William Chen
Lei Li
Graham Neubig
Shinji Watanabe
56
1
0
14 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
91
0
0
08 Aug 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
69
2
0
28 Jul 2024
Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses
C. Li
Ngoc Thang Vu
67
4
0
26 Jul 2024
Quantifying the Role of Textual Predictability in Automatic Speech Recognition
Sean Robertson
Gerald Penn
Ewan Dunbar
45
1
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
110
6
0
21 Jul 2024
Linear-Complexity Self-Supervised Learning for Speech Processing
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
120
1
0
18 Jul 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Haibin Wu
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Daniel Tompkins
...
Canrun Li
Zhen Xiao
Sheng Zhao
Jinyu Li
Naoyuki Kanda
111
9
0
17 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen M. Meng
Furu Wei
150
43
0
11 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
91
54
0
07 Jul 2024
Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units
Bolaji Yusuf
Jan "Honza" Černocký
Murat Saraclar
62
1
0
05 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
97
2
0
04 Jul 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan
Nithin Rao Koluguri
Ante Jukić
Ryan Langman
Jagadeesh Balam
Boris Ginsburg
93
3
0
03 Jul 2024
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
53
1
0
02 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
119
19
0
30 Jun 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Tomoya Yanagita
...
Haotian Tan
Makoto Sakai
S. Sakti
Katsuhito Sudoh
Satoshi Nakamura
122
1
0
30 Jun 2024
Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations
Salah Zaiem
Titouan Parcollet
S. Essid
CLL
95
4
0
30 Jun 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLM
RALM
92
11
0
28 Jun 2024
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Yi Zhu
Tiago H. Falk
MedIm
75
1
0
26 Jun 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
75
70
0
26 Jun 2024
Speech Analysis of Language Varieties in Italy
Moreno La Quatra
Alkis Koudounas
Elena Baralis
Sabato Marco Siniscalchi
103
3
0
22 Jun 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Guanrou Yang
Xie Chen
AuLLM
52
4
0
22 Jun 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
75
7
0
16 Jun 2024
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
Siddhant Arora
Ankita Pasad
Chung-Ming Chien
Jionghao Han
Roshan S. Sharma
...
William Chen
Suwon Shon
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
82
6
0
14 Jun 2024
Multi-Modal Retrieval For Large Language Model Based Speech Recognition
J. Kolehmainen
Aditya Gourav
Prashanth Gurunath Shivakumar
Yile Gu
Ankur Gandhe
Ariya Rastrow
Grant P. Strimel
I. Bulyko
85
5
0
13 Jun 2024
Previous
1
2
3
4
5
...
8
9
10
Next