Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.07875
Cited By
Libri-Light: A Benchmark for ASR with Limited or No Supervision
17 December 2019
Jacob Kahn
M. Rivière
Weiyi Zheng
Evgeny Kharitonov
Qiantong Xu
Pierre-Emmanuel Mazaré
Julien Karadayi
Vitaliy Liptchinsky
R. Collobert
Christian Fuegen
Tatiana Likhomanenko
Gabriel Synnaeve
Armand Joulin
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Libri-Light: A Benchmark for ASR with Limited or No Supervision"
50 / 475 papers shown
Title
Sustainable self-supervised learning for speech representations
Luis Lugo
Valentin Vielzeuf
79
2
0
11 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
90
5
0
09 Jun 2024
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
Zirun Zhu
...
Yufei Xia
Jinzhu Li
Sheng Zhao
Jinyu Li
Naoyuki Kanda
66
3
0
09 Jun 2024
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
80
1
0
09 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
118
82
0
08 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
100
84
0
07 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
66
6
0
06 Jun 2024
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Purva Chiniya
Utkarsh Tyagi
R. Duraiswami
Dinesh Manocha
86
0
0
06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
95
4
0
06 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
86
6
0
05 Jun 2024
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Yongxin Zhu
Dan Su
Liqiang He
Linli Xu
Dong Yu
77
7
0
03 Jun 2024
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
Shengpeng Ji
Jia-li Zuo
Wen Wang
Jialong Zuo
Minghui Fang
...
Ziyue Jiang
Hai Huang
Xize Cheng
Siqi Zheng
Zhou Zhao
109
0
0
03 Jun 2024
YODAS: Youtube-Oriented Dataset for Audio and Speech
Xinjian Li
Shinnosuke Takamichi
Takaaki Saeki
William Chen
Sayaka Shiota
Shinji Watanabe
125
27
0
02 Jun 2024
Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting
Ihab Asaad
Maxime Jacquelin
Olivier Perrotin
Laurent Girin
Thomas Hueber
77
0
0
30 May 2024
Deep Learning for Assessment of Oral Reading Fluency
Mithilesh Vaidya
Binaya Kumar Sahoo
Preeti Rao
41
0
0
29 May 2024
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Vicky Zayats
Peter Chen
Melissa Ferrari
Dirk Padfield
AI4CE
77
1
0
29 May 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
Chenyang Le
Yao Qian
Dongmei Wang
Long Zhou
Shujie Liu
...
Midia Yousefi
Yanmin Qian
Jinyu Li
Sheng Zhao
Michael Zeng
81
3
0
28 May 2024
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition
Zijin Gu
Tatiana Likhomanenko
Richard He Bai
Erik McDermott
R. Collobert
Navdeep Jaitly
AuLLM
79
5
0
24 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Dongseong Hwang
ODL
93
8
0
21 May 2024
Investigating the Áutoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining
Valentin Vielzeuf
SSL
90
0
0
14 May 2024
A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech
Oli Danyi Liu
Hao Tang
Naomi H Feldman
Sharon Goldwater
58
1
0
13 May 2024
Look Once to Hear: Target Speech Hearing with Noisy Examples
Bandhav Veluri
Malek Itani
Tuochao Chen
Takuya Yoshioka
Shyamnath Gollakota
79
17
0
10 May 2024
SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
You Zhang
Yongyi Zang
Jiatong Shi
Ryuichi Yamamoto
Jionghao Han
Yuxun Tang
Tomoki Toda
Zhiyao Duan
97
5
0
08 May 2024
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
June-Woo Kim
Miika Toikkanen
Sangmin Bae
Minseok Kim
Ho-Young Jung
81
7
0
05 May 2024
Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra
Alkis Koudounas
Lorenzo Vaiani
Elena Baralis
Luca Cagliero
Paolo Garza
Sabato Marco Siniscalchi
74
13
0
02 May 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
51
2
0
30 Apr 2024
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair
Yusuke Sakai
Mana Makinae
Hidetaka Kamigaito
Taro Watanabe
98
5
0
18 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
130
1
0
16 Apr 2024
Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping
Kevin Zhang
Luka Chkhetiani
Francis McCann Ramirez
Yash Khare
Andrea Vanzo
...
Ruben Bousbib
Taufiquzzaman Peyash
Michael Nguyen
Dillon Pulliam
Domenic Donato
59
4
0
10 Apr 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
...
Hanzheng Li
Shuai Fan
Hui Zhang
Xie Chen
Kai Yu
88
1
0
09 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
76
28
0
04 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
122
44
0
03 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
125
25
0
03 Apr 2024
Noise Masking Attacks and Defenses for Pretrained Speech Models
Matthew Jagielski
Om Thakkar
Lun Wang
AAML
95
5
0
02 Apr 2024
Scaling Properties of Speech Language Models
Santiago Cuervo
R. Marxer
97
11
0
31 Mar 2024
An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
111
1
0
13 Mar 2024
Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets
Jan Pevsán
Santosh Kesiraju
Lukávs Burget
JanHonza'' vCernocký
56
0
0
12 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
154
180
0
05 Mar 2024
Compact Speech Translation Models via Discrete Speech Units Pretraining
Tsz Kin Lam
Alexandra Birch
Barry Haddow
124
3
0
29 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
102
7
0
25 Feb 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
Nik Vaessen
David A. van Leeuwen
81
3
0
21 Feb 2024
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Ziyue Jiang
Dingdong Wang
Hanting Wang
Jialung Zuo
Shulei Wang
AuLLM
94
0
0
19 Feb 2024
Probing Self-supervised Learning Models with Target Speech Extraction
Junyi Peng
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Takanori Ashihara
Shoko Araki
J. Černocký
105
4
0
17 Feb 2024
Pushing the Limits of Zero-shot End-to-End Speech Translation
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
86
10
0
16 Feb 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Shengpeng Ji
Ziyue Jiang
Hanting Wang
Jia-li Zuo
Zhou Zhao
74
16
0
14 Feb 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma
Guanrou Yang
Yifan Yang
Zhifu Gao
Jiaming Wang
...
Fan Yu
Qian Chen
Siqi Zheng
Shiliang Zhang
Xie Chen
AuLLM
96
60
0
13 Feb 2024
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Naoyuki Kanda
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
...
Yufei Xia
Jinzhu Li
Yanqing Liu
Sheng Zhao
Michael Zeng
96
8
0
12 Feb 2024
Large Language Models for Time Series: A Survey
Xiyuan Zhang
Ranak Roy Chowdhury
Rajesh K. Gupta
Jingbo Shang
AI4TS
139
67
0
02 Feb 2024
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
Yihan Wu
Soumi Maiti
Yifan Peng
Wangyou Zhang
Chenda Li
Yuyue Wang
Xihua Wang
Shinji Watanabe
Ruihua Song
75
4
0
31 Jan 2024
SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
Chyi-Jiunn Lin
Guan-Ting Lin
Yung-Sung Chuang
Wei-Lun Wu
Shang-Wen Li
Abdelrahman Mohamed
Hung-yi Lee
Lin-shan Lee
RALM
69
5
0
24 Jan 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next