Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

6 November 2020

Papers citing "Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis"

49 / 49 papers shown

Title
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications Biel Tura Vecino Adam Gabry's Daniel Mątwicki Andrzej Pomirski Tom Iddon Marius Cotescu Jaime Lorenzo-Trueba 34 0 0 12 May 2025
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget$ Xin Li Kaikai Jia Hao Sun Jun Dai Z. L. Jiang 126 0 0 27 Apr 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM O. Mutlu Ataberk Olgun Geraldo F. Oliveira Ismail Emir Yüksel 44 3 0 26 Dec 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Zhijun Liu Shuai Wang Sho Inoue Qibing Bai Haizhou Li DiffM 47 15 0 08 Jun 2024
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning Raviraj Joshi Nikesh Garera 25 0 0 02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints Raviraj Joshi Nikesh Garera 25 0 0 02 Dec 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech Yuan Gao Nobuyuki Morioka Yu Zhang Nanxin Chen DiffM 26 27 0 02 Nov 2023
Prosody Analysis of Audiobooks Charuta Pethe Yunting Yin Felix D Childress Yunting Yin Steven Skiena 6 0 0 10 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation Roi Benita Michael Elad Joseph Keshet DiffM 25 7 0 02 Oct 2023
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework Jianzong Wang Xulong Zhang Aolan Sun Ning Cheng Jing Xiao 29 1 0 16 Sep 2023
A Systematic Exploration of Joint-training for Singing Voice Synthesis Yuning Wu Yifeng Yu Jiatong Shi Tao Qian Qin Jin 38 5 0 05 Aug 2023
The Ethical Implications of Generative Audio Models: A Systematic Literature Review J. Barnett 16 25 0 07 Jul 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention Victor Agostinelli Lizhong Chen 22 1 0 17 Apr 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI Chenshuang Zhang Chaoning Zhang Sheng Zheng Mengchun Zhang Maryam Qamar Sung-Ho Bae In So Kweon DiffM MedIm 41 64 0 23 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Rui Xue Yanqing Liu Lei He Xuejiao Tan Linquan Liu Ed Lin Sheng Zhao 26 7 0 06 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus Ajinkya Kulkarni Atharva Kulkarni Sara Shatnawi Hanan Aldarmaki 17 8 0 28 Feb 2023
Learning the joint distribution of two sequences using little or no paired data Soroosh Mariooryad Matt Shannon Siyuan Ma Tom Bagby David Kao Daisy Stanton Eric Battenberg RJ Skerry-Ryan 10 2 0 06 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis Yinjiao Lei Shan Yang Xinsheng Wang Qicong Xie Jixun Yao Linfu Xie Dan Su DiffM 13 8 0 03 Dec 2022
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities Amin Azmoodeh Ali Dehghantanha 29 2 0 26 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 16 46 0 17 Nov 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection Kentaro Seki Shinnosuke Takamichi Takaaki Saeki Hiroshi Saruwatari 19 6 0 26 Oct 2022
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion Yuta Matsunaga Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 9 1 0 18 Oct 2022
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis Yuta Matsunaga Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 12 2 0 14 Oct 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders Yanqing Liu Rui Xue Lei He Xu Tan Sheng Zhao 16 24 0 11 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqiang Wang Zhou Zhao 19 10 0 08 Jul 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History Yuto Nishimura Yuki Saito Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari AI4TS 17 7 0 16 Jun 2022
Multi-instrument Music Synthesis with Spectrogram Diffusion Curtis Hawthorne Ian Simon Adam Roberts Neil Zeghidour Josh Gardner Ethan Manilow Jesse Engel DiffM 21 48 0 11 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality Xu Tan Jiawei Chen Haohe Liu Jian Cong Chen Zhang ... Lei He Frank Soong Tao Qin Sheng Zhao Tie-Yan Liu 35 211 0 09 May 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis Hubert Siuzdak Piotr Dura Pol van Rijn Nori Jacoby AI4TS 10 30 0 31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech D. Lim Sunghee Jung Eesung Kim 14 51 0 31 Mar 2022
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning Takaaki Saeki Kentaro Tachibana Ryuichi Yamamoto 11 10 0 29 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning Qingyu Xing Xiaohan Ma 13 0 0 06 Mar 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention Artem Gorodetskii Ivan Ozhiganov 15 2 0 25 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis Yu Wang Xinsheng Wang Pengcheng Zhu Jie Wu Hanzhao Li Heyang Xue Yongmao Zhang Lei Xie Mengxiao Bi 25 95 0 19 Jan 2022
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance Heeseung Kim Sungwon Kim Sungroh Yoon DiffM BDL 19 107 0 23 Nov 2021
Assessing Evaluation Metrics for Speech-to-Speech Translation Elizabeth Salesky Julian Mäder Severin Klinger 22 14 0 26 Oct 2021
Chunked Autoregressive GAN for Conditional Waveform Synthesis Max Morrison Rithesh Kumar Kundan Kumar Prem Seetharaman Aaron Courville Yoshua Bengio GAN 36 68 0 19 Oct 2021
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation Danni Liu Changhan Wang Hongyu Gong Xutai Ma Yun Tang J. Pino 17 4 0 15 Oct 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 26 3 0 22 Sep 2021
On-device neural speech synthesis Sivanand Achanta Albert Antony L. Golipour Jiangchuan Li T. Raitio ... Francesco Rossi Jennifer Shi Jaimin Upadhyay David Winarsky Hepeng Zhang 27 17 0 17 Sep 2021
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit Changhan Wang Wei-Ning Hsu Yossi Adi Adam Polyak Ann Lee Peng-Jen Chen Jiatao Gu J. Pino VLM 67 32 0 14 Sep 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang Qicong Xie Jihua Zhu Lei Xie O. Scharenborg 25 16 0 09 Aug 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis Jian Cong Shan Yang Lei Xie Dan Su DRL 16 29 0 21 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Nanxin Chen Yu Zhang Heiga Zen Ron J. Weiss Mohammad Norouzi Najim Dehak William Chan DiffM 16 88 0 17 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 27 839 0 11 Jun 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS Ye Jia Heiga Zen Jonathan Shen Yu Zhang Yonghui Wu SSL 19 81 0 28 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Dan Su 28 22 0 12 Feb 2021
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 223 239 0 25 Sep 2019