Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.07281
Cited By
v1
v2
v3 (latest)
ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech
19 July 2018
Ming-Yu Liu
Kainan Peng
Jitong Chen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech"
50 / 134 papers shown
Title
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
Masaya Kawamura
Takuya Hasumi
Yuma Shirahata
Ryuichi Yamamoto
MQ
44
0
0
04 Jun 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
154
1
0
23 May 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
56
0
0
14 May 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
121
6
0
26 Dec 2024
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
73
2
0
02 Dec 2023
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
28
3
0
22 Oct 2023
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram
Zhifeng Kong
Ming-Yu Liu
Ambrish Dantrey
Bryan Catanzaro
51
7
0
12 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Jing Liu
278
31
0
27 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
188
39
0
24 Aug 2023
DreamHuman: Animatable 3D Avatars from Text
Nikos Kolotouros
Thiemo Alldieck
Andrei Zanfir
Eduard Gabriel Bazavan
Mihai Fieraru
C. Sminchisescu
111
101
0
15 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
116
15
0
05 Jun 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
73
1
0
09 May 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
169
48
0
21 Mar 2023
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming-Hsuan Yang
Qin Huang
141
27
0
08 Dec 2022
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole
Ajay Jain
Jonathan T. Barron
B. Mildenhall
283
2,445
0
29 Sep 2022
Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0
M. S. Al-Radhi
Tamás Gábor Csapó
Csaba Zainkó
Géza Németh
50
1
0
15 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Giulia Comini
Goeric Huybrechts
M. Ribeiro
Adam Gabry's
Jaime Lorenzo-Trueba
67
5
0
29 Jul 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
Yanqing Liu
Rui Xue
Lei He
Xu Tan
Sheng Zhao
87
25
0
11 Jul 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Yuto Nishimura
Yuki Saito
Shinnosuke Takamichi
Kentaro Tachibana
Hiroshi Saruwatari
AI4TS
59
8
0
16 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
159
255
0
09 Jun 2022
Parallel Synthesis for Autoregressive Speech Generation
Po-Chun Hsu
Da-Rong Liu
Andy T. Liu
Hung-yi Lee
80
5
0
25 Apr 2022
Streamable Neural Audio Synthesis With Non-Causal Convolutions
Antoine Caillon
P. Esling
85
12
0
14 Apr 2022
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation
Rendi Chevi
Radityo Eko Prasojo
Alham Fikri Aji
Andros Tjandra
S. Sakti
VLM
50
4
0
29 Mar 2022
Improve few-shot voice cloning using multi-modal learning
Haitong Zhang
Yue Lin
46
8
0
18 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Takuhiro Kaneko
Kou Tanaka
Hirokazu Kameoka
Shogo Seki
89
62
0
04 Mar 2022
It's Raw! Audio Generation with State-Space Models
Karan Goel
Albert Gu
Chris Donahue
Christopher Ré
95
195
0
20 Feb 2022
Speech Denoising in the Waveform Domain with Self-Attention
Zhifeng Kong
Ming-Yu Liu
Ambrish Dantrey
Bryan Catanzaro
89
63
0
15 Feb 2022
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
63
17
0
07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
204
131
0
04 Nov 2021
CaloFlow II: Even Faster and Still Accurate Generation of Calorimeter Showers with Normalizing Flows
Claudius Krause
David Shih
91
64
0
21 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
99
43
0
15 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Haitong Zhang
Yue Lin
56
0
0
14 Oct 2021
On-device neural speech synthesis
Sivanand Achanta
Albert Antony
L. Golipour
Jiangchuan Li
T. Raitio
...
Francesco Rossi
Jennifer Shi
Jaimin Upadhyay
David Winarsky
Hepeng Zhang
108
17
0
17 Sep 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person
Xinsheng Wang
Qicong Xie
Jihua Zhu
Lei Xie
O. Scharenborg
120
19
0
09 Aug 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
133
359
0
29 Jun 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Zhengxi Liu
Y. Qian
DRL
49
10
0
25 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
DiffM
99
88
0
17 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
116
132
0
15 Jun 2021
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
Gal Greshler
Tamar Rott Shaham
T. Michaeli
102
25
0
11 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
René Peinl
48
0
0
11 Jun 2021
Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics
V. Jayaram
John Thickstun
DiffM
107
25
0
17 May 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
Shoule Wu
Ziqiang Shi
DiffM
157
11
0
17 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
94
25
0
20 Apr 2021
Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN
Reo Yoneyama
Yi-Chiao Wu
Tomoki Toda
73
12
0
10 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Myeonghun Jeong
Hyeongju Kim
Sung Jun Cheon
Byoung Jin Choi
N. Kim
DiffM
70
197
0
03 Apr 2021
Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN
Cong Wang
Yu Chen
Bin Wang
Yi Shi
146
1
0
26 Mar 2021
PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components
Yukiya Hono
Shinji Takaki
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
69
16
0
15 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Jane Polak Scowcroft
95
22
0
12 Feb 2021
Universal Neural Vocoding with Parallel WaveNet
Yunlong Jiao
Adam Gabry's
Georgi Tinchev
Bartosz Putrycz
Daniel Korzekwa
V. Klimkov
81
42
0
01 Feb 2021
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Eunwoo Song
Ryuichi Yamamoto
Min-Jae Hwang
Jin-Seob Kim
Ohsung Kwon
Jae-Min Kim
65
14
0
19 Jan 2021
1
2
3
Next