v1v2v3 (latest)

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech

19 July 2018

Ming-Yu Liu

Kainan Peng

Jitong Chen

ArXiv (abs)PDF HTML

Papers citing "ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech"

50 / 134 papers shown

Title
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing Masaya Kawamura Takuya Hasumi Yuma Shirahata Ryuichi Yamamoto MQ 44 0 0 04 Jun 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training Zhihao Du Changfeng Gao Yuxuan Wang Fan Yu Tianyu Zhao ... Mengzhe Chen Yafeng Chen Shiliang Zhang Wen Wang Jieping Ye AuLLM 159 1 0 23 May 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis Zeeshan Ahmad Shudi Bao Meng Chen 56 0 0 14 May 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM O. Mutlu Ataberk Olgun Geraldo F. Oliveira Ismail Emir Yüksel 121 6 0 26 Dec 2024
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning Raviraj Joshi Nikesh Garera 73 2 0 02 Dec 2023
An overview of text-to-speech systems and media applications Mohammad Reza Hasanabadi 28 3 0 22 Oct 2023
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram Zhifeng Kong Ming-Yu Liu Ambrish Dantrey Bryan Catanzaro 51 7 0 12 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey Lin Geng Foo Hossein Rahmani Jing Liu 278 31 0 27 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook S. Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Yi Ren ... Wenwu Wang Xulong Zhang Roberto Togneri Min Zhang Björn W. Schuller LM&MA AuLLM 188 39 0 24 Aug 2023
DreamHuman: Animatable 3D Avatars from Text Nikos Kolotouros Thiemo Alldieck Andrei Zanfir Eduard Gabriel Bazavan Mihai Fieraru C. Sminchisescu 111 101 0 15 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading Yochai Yemini Aviv Shamsian Lior Bracha Sharon Gannot Ethan Fetaya DiffM 116 15 0 05 Jun 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings Wei Xue Yiwen Wang Qi-fei Liu Yi-Ting Guo 73 1 0 09 May 2023
Transformers in Speech Processing: A Survey S. Latif Aun Zaidi Heriberto Cuayáhuitl Fahad Shamshad Moazzam Shoukat Muhammad Usama Junaid Qadir 169 48 0 21 Mar 2023
Learning to Dub Movies via Hierarchical Prosody Models Gaoxiang Cong Liang Li Yuankai Qi Zhengjun Zha Qi Wu Wen-yu Wang Bin Jiang Ming-Hsuan Yang Qin Huang 141 27 0 08 Dec 2022
DreamFusion: Text-to-3D using 2D Diffusion Ben Poole Ajay Jain Jonathan T. Barron B. Mildenhall 286 2,445 0 29 Sep 2022
Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0 M. S. Al-Radhi Tamás Gábor Csapó Csaba Zainkó Géza Németh 50 1 0 15 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation Giulia Comini Goeric Huybrechts M. Ribeiro Adam Gabry's Jaime Lorenzo-Trueba 67 5 0 29 Jul 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders Yanqing Liu Rui Xue Lei He Xu Tan Sheng Zhao 87 25 0 11 Jul 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History Yuto Nishimura Yuki Saito Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari AI4TS 59 8 0 16 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training Sang-gil Lee Ming-Yu Liu Boris Ginsburg Bryan Catanzaro Sung-Hoon Yoon 159 255 0 09 Jun 2022
Parallel Synthesis for Autoregressive Speech Generation Po-Chun Hsu Da-Rong Liu Andy T. Liu Hung-yi Lee 80 5 0 25 Apr 2022
Streamable Neural Audio Synthesis With Non-Causal Convolutions Antoine Caillon P. Esling 85 12 0 14 Apr 2022
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation Rendi Chevi Radityo Eko Prasojo Alham Fikri Aji Andros Tjandra S. Sakti VLM 50 4 0 29 Mar 2022
Improve few-shot voice cloning using multi-modal learning Haitong Zhang Yue Lin 46 8 0 18 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Takuhiro Kaneko Kou Tanaka Hirokazu Kameoka Shogo Seki 89 62 0 04 Mar 2022
It's Raw! Audio Generation with State-Space Models Karan Goel Albert Gu Chris Donahue Christopher Ré 98 195 0 20 Feb 2022
Speech Denoising in the Waveform Domain with Self-Attention Zhifeng Kong Ming-Yu Liu Ambrish Dantrey Bryan Catanzaro 89 63 0 15 Feb 2022
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 63 17 0 07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection Joel Frank Lea Schonherr DiffM 204 131 0 04 Nov 2021
CaloFlow II: Even Faster and Still Accurate Generation of Calorimeter Showers with Normalizing Flows Claudius Krause David Shih 91 64 0 21 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts Chenxu Hu Qiao Tian Tingle Li Yuping Wang Yuxuan Wang Hang Zhao DiffM VGen 99 43 0 15 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data Haitong Zhang Yue Lin 56 0 0 14 Oct 2021
On-device neural speech synthesis Sivanand Achanta Albert Antony L. Golipour Jiangchuan Li T. Raitio ... Francesco Rossi Jennifer Shi Jaimin Upadhyay David Winarsky Hepeng Zhang 108 17 0 17 Sep 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang Qicong Xie Jihua Zhu Lei Xie O. Scharenborg 120 19 0 09 Aug 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 133 359 0 29 Jun 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition Zhengxi Liu Y. Qian DRL 49 10 0 25 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Nanxin Chen Yu Zhang Heiga Zen Ron J. Weiss Mohammad Norouzi Najim Dehak William Chan DiffM 99 88 0 17 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation Won Jang D. Lim Jaesam Yoon Bongwan Kim Juntae Kim 116 132 0 15 Jun 2021
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example Gal Greshler Tamar Rott Shaham T. Michaeli 102 25 0 11 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache René Peinl 48 0 0 11 Jun 2021
Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics V. Jayaram John Thickstun DiffM 107 25 0 17 May 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation Shoule Wu Ziqiang Shi DiffM 157 11 0 17 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 94 25 0 20 Apr 2021
Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN Reo Yoneyama Yi-Chiao Wu Tomoki Toda 73 12 0 10 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech Myeonghun Jeong Hyeongju Kim Sung Jun Cheon Byoung Jin Choi N. Kim DiffM 70 197 0 03 Apr 2021
Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN Cong Wang Yu Chen Bin Wang Yi Shi 146 1 0 26 Mar 2021
PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components Yukiya Hono Shinji Takaki Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku K. Tokuda 69 16 0 15 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Jane Polak Scowcroft 95 22 0 12 Feb 2021
Universal Neural Vocoding with Parallel WaveNet Yunlong Jiao Adam Gabry's Georgi Tinchev Bartosz Putrycz Daniel Korzekwa V. Klimkov 81 42 0 01 Feb 2021
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss Eunwoo Song Ryuichi Yamamoto Min-Jae Hwang Jin-Seob Kim Ohsung Kwon Jae-Min Kim 65 14 0 19 Jan 2021