Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2505.04457
Cited By
v1
v2
v3
v4 (latest)
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
7 May 2025
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration"
47 / 47 papers shown
Title
Improving DF-Conformer Using Hydra For High-Fidelity Generative Speech Enhancement on Discrete Codec Token
Shogo Seki
Shaoxiang Dang
Li Li
60
0
0
04 Nov 2025
Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing
Wataru Nakata
Yuki Saito
Yota Ueda
Hiroshi Saruwatari
106
0
0
21 Sep 2025
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Wataru Nakata
Yuma Koizumi
Shigeki Karita
Robin Scheibler
Haruko Ishikawa
Adriana Guevara-Rukoz
Heiga Zen
M. Bacchiani
309
2
0
08 May 2025
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
Heitor R. Guimarães
Jiaqi Su
Rithesh Kumar
Tiago H. Falk
Zeyu Jin
DiffM
294
8
0
13 Apr 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Boyi Kang
Xinfa Zhu
Zihan Zhang
Zhen Ye
Mingshuai Liu
...
Jun Chen
Longshuai Xiao
Chao Weng
Wei Xue
Lei Xie
AuLLM
593
17
0
01 Mar 2025
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Edouard Grave
Edouard Grave
Neil Zeghidour
AuLLM
387
327
0
17 Sep 2024
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xiaoyu Liu
Xu Li
Joan Serrà
Santiago Pascual
217
5
0
14 Sep 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Interspeech (Interspeech), 2024
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
192
10
0
12 Aug 2024
Universal Score-based Speech Enhancement with High Content Preservation
Robin Scheibler
Yusuke Fujita
Yuma Shirahata
Tatsuya Komatsu
DiffM
257
33
0
18 Jun 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
354
53
0
10 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Interspeech (Interspeech), 2024
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
180
189
0
07 Jun 2024
Diffusion Models for Audio Restoration
Jean-Marie Lemercier
Julius Richter
Simon Welker
Eloi Moliner
Vesa Valimaki
Timo Gerkmann
252
44
0
15 Feb 2024
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Daniel Lyth
Simon King
250
89
0
02 Feb 2024
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
295
28
0
11 Oct 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Reo Shimizu
Ryuichi Yamamoto
Masaya Kawamura
Yuma Shirahata
Hironori Doi
Tatsuya Komatsu
Kentaro Tachibana
DiffM
260
41
0
15 Sep 2023
Voice Conversion With Just Nearest Neighbors
Interspeech (Interspeech), 2023
Matthew Baas
Benjamin van Niekerk
Herman Kamper
SSL
199
93
0
30 May 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Interspeech (Interspeech), 2023
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
197
131
0
30 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
3.3K
20,007
0
15 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
246
43
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
318
342
0
02 Mar 2023
SQuId: Measuring Speech Naturalness in Many Languages
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Thibault Sellam
Ankur Bapna
Joshua Camp
Diana Mackinnon
Ankur P. Parikh
Jason Riesa
176
24
0
12 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Spoken Language Technology Workshop (SLT), 2022
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
182
33
0
03 Oct 2022
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhong-Qiu Wang
Samuele Cornell
Shukjae Choi
Younglo Lee
Byeonghak Kim
Shinji Watanabe
253
150
0
08 Sep 2022
Speech Enhancement and Dereverberation with Diffusion-based Generative Models
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Julius Richter
Simon Welker
Jean-Marie Lemercier
Bunlong Lay
Timo Gerkmann
DiffM
317
306
0
11 Aug 2022
Universal Speech Enhancement with Score-based Diffusion
Joan Serrà
Santiago Pascual
Jordi Pons
R. O. Araz
D. Scaini
DiffM
243
130
0
07 Jun 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Spoken Language Technology Workshop (SLT), 2022
Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
VLM
396
449
0
25 May 2022
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration
Interspeech (Interspeech), 2022
Haohe Liu
Xubo Liu
Qiuqiang Kong
Qiao Tian
Yan Zhao
DeLiang Wang
Chuanzeng Huang
Yuxuan Wang
181
78
0
12 Apr 2022
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
Interspeech (Interspeech), 2022
Takaaki Saeki
Shinnosuke Takamichi
Tomohiko Nakamura
Naoko Tanji
Hiroshi Saruwatari
149
8
0
24 Mar 2022
Self-supervised Learning with Random-projection Quantizer for Speech Recognition
International Conference on Machine Learning (ICML), 2022
Chung-Cheng Chiu
James Qin
Yu Zhang
Jiahui Yu
Yonghui Wu
SSL
229
216
0
03 Feb 2022
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
International Conference on Language Resources and Evaluation (LREC), 2022
Yeting Jia
Michelle Tadmor Ramanovich
Quan Wang
Heiga Zen
SLR
179
87
0
11 Jan 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
735
2,548
0
26 Oct 2021
Towards a Unified View of Parameter-Efficient Transfer Learning
International Conference on Learning Representations (ICLR), 2021
Junxian He
Chunting Zhou
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
AAML
443
1,078
0
08 Oct 2021
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Automatic Speech Recognition & Understanding (ASRU), 2021
Yu-An Chung
Yu Zhang
Wei Han
Chung-Cheng Chiu
James Qin
Ruoming Pang
Yonghui Wu
SSL
VLM
199
486
0
07 Aug 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021
Yuma Koizumi
Shigeki Karita
Scott Wisdom
Hakan Erdogan
J. Hershey
Llion Jones
M. Bacchiani
219
48
0
30 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wei-Ning Hsu
Benjamin Bolte
Yifan Hao
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
468
3,870
0
14 Jun 2021
MLS: A Large-Scale Multilingual Dataset for Speech Research
Interspeech (Interspeech), 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
526
653
0
07 Dec 2020
DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Chandan K. A. Reddy
Vishak Gopal
Ross Cutler
240
427
0
28 Oct 2020
WaveGrad: Estimating Gradients for Waveform Generation
International Conference on Learning Representations (ICLR), 2020
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
William Chan
DiffM
BDL
318
870
0
02 Sep 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
1.1K
7,195
0
20 Jun 2020
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Interspeech (Interspeech), 2020
Jiaqi Su
Zeyu Jin
Adam Finkelstein
131
155
0
10 Jun 2020
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
International Conference on Language Resources and Evaluation (LREC), 2020
Changhan Wang
J. Pino
Anne Wu
Jiatao Gu
SLR
197
102
0
04 Feb 2020
Common Voice: A Massively-Multilingual Speech Corpus
International Conference on Language Resources and Evaluation (LREC), 2019
Rosana Ardila
Megan Branson
Kelly Davis
Michael Henretty
M. Kohler
Josh Meyer
Reuben Morais
Lindsay Saunders
Francis M. Tyers
Gregor Weber
VLM
279
2,013
0
13 Dec 2019
Parametric Resynthesis with neural vocoders
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019
Soumi Maiti
Michael I. Mandel
166
20
0
16 Jun 2019
Sample Efficient Adaptive Text-to-Speech
International Conference on Learning Representations (ICLR), 2018
Yutian Chen
Yannis Assael
Brendan Shillingford
David Budden
Scott E. Reed
...
Ben Laurie
Çağlar Gülçehre
Aaron van den Oord
Oriol Vinyals
Nando de Freitas
193
156
0
27 Sep 2018
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks
Sercan O. Arik
Heewoo Jun
G. Diamos
231
118
0
20 Aug 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
551
900
0
12 Jun 2018
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAtt
AIMat
OffRL
AI4CE
748
2,769
0
22 Sep 2017
1