Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

22 May 2020

Papers citing "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search"

50 / 286 papers shown

Title
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus Detai Xin Shinnosuke Takamichi Ai Morimatsu Hiroshi Saruwatari 8 10 0 21 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training Zhe Ye Rongjie Huang Yi Ren Ziyue Jiang Jinglin Liu Jinzheng He Xiang Yin Zhou Zhao CLIP 26 20 0 18 May 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts Chengzhe Sun Shan Jia Shuwei Hou Siwei Lyu 30 38 0 25 Apr 2023
An End-to-End Neural Network for Image-to-Audio Transformation Liu Chen Michael Deisher Munir Georges 16 3 0 10 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Rui Xue Yanqing Liu Lei He Xuejiao Tan Linquan Liu Ed Lin Sheng Zhao 26 7 0 06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS Siyang Wang G. Henter Joakim Gustafson Éva Székely 30 4 0 05 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model Haolin Chen Philip N. Garner DiffM 31 1 0 03 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus Ajinkya Kulkarni Atharva Kulkarni Sara Shatnawi Hanan Aldarmaki 17 8 0 28 Feb 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech Kentaro Mitsui Yukiya Hono Kei Sawada CVBM 11 6 0 28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech Jiyoung Lee Joon Son Chung Soo-Whan Chung DiffM 26 27 0 27 Feb 2023
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech D. Yang Tomoki Koriyama Yuki Saito Takaaki Saeki Detai Xin Hiroshi Saruwatari 10 7 0 27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow Yoonhyung Lee Jinhyeok Yang Kyomin Jung 12 6 0 27 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts Chengzhe Sun Shan Jia Shuwei Hou Ehab AlBadawy Siwei Lyu 120 3 0 18 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Li-Wei Chen Shinji Watanabe Alexander I. Rudnicky 6 35 0 08 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt Dongchao Yang Songxiang Liu Rongjie Huang Chao Weng H. Meng DiffM VLM 31 84 0 31 Jan 2023
DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion Models Haomin Wen Youfang Lin Yutong Xia Huaiyu Wan Qingsong Wen Roger Zimmermann Yuxuan Liang DiffM 23 81 0 31 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion Hao Liu Tao Wang Ruibo Fu Jiangyan Yi Zhengqi Wen J. Tao 18 3 0 10 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech Ze Chen Yihan Wu Yichong Leng Jiawei Chen Haohe Liu ... Ke Wang Lei He Sheng Zhao Jiang Bian Danilo P. Mandic DiffM 22 22 0 30 Dec 2022
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models Yinghao Aaron Li Cong Han N. Mesgarani 17 18 0 29 Dec 2022
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder Yusuke Yasuda T. Toda DiffM 15 7 0 16 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis Chunyu Qiang Peng Yang Hao Che Xiaorui Wang Zhongyuan Wang BDL 21 6 0 13 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech Byoung Jin Choi Myeonghun Jeong Joun Yeop Lee N. Kim 12 12 0 30 Nov 2022
Prosody-controllable spontaneous TTS with neural HMMs Harm Lameris Shivam Mehta G. Henter Joakim Gustafson Éva Székely 27 15 0 24 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 34 18 0 17 Nov 2022
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation Chunyu Qiang Peng Yang Hao Che Jinba Xiao Xiaorui Wang Zhongyuan Wang 13 3 0 17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 16 46 0 17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models Minki Kang Dong Min Sung Ju Hwang DiffM 20 48 0 17 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 26 12 0 13 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space Jihwan Lee Jaesung Bae Seongkyu Mun Heejin Choi Joun Yeop Lee Hoon-Young Cho Chanwoo Kim 24 2 0 06 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis Konstantinos Klapsas Karolos Nikitaras Nikolaos Ellinas June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 11 0 0 02 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Georgia Maniati Panos Kakoulidis June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 16 2 0 31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis Anusha Prakash H. Murthy 13 0 0 31 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform Masaya Kawamura Yuma Shirahata Ryuichi Yamamoto Kentaro Tachibana 24 15 0 28 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS Florian Lux Julia Koch Ngoc Thang Vu 30 22 0 21 Oct 2022
Invertible Monotone Operators for Normalizing Flows Byeongkeun Ahn Chiyoon Kim Youngjoon Hong Hyunwoo J. Kim TPM 30 8 0 15 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario Emily R. Bartusiak Edward J. Delp 19 12 0 14 Oct 2022
Can we use Common Voice to train a Multi-Speaker TTS system? Sewade Ogun Vincent Colotte Emmanuel Vincent 19 10 0 12 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi Myeonghun Jeong Minchan Kim Sung Hwan Mun N. Kim DiffM 17 5 0 12 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition Kyuhong Shim Wonyong Sung 25 2 0 01 Oct 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS Haohan Guo Fenglong Xie Frank Soong Xixin Wu Helen M. Meng 37 11 0 22 Sep 2022
Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale Aditya Agarwal Bipasha Sen Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar 22 0 0 21 Aug 2022
G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model Pan Xie Qipeng Zhang Zexian Li Hao Tang Yao Du Xiaohui Hu DiffM 36 12 0 19 Aug 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Rongjie Huang Zhou Zhao Huadai Liu Jinglin Liu Chenye Cui Yi Ren DiffM 44 193 0 13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 28 10 0 13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate Nabarun Goswami Tatsuya Harada 10 5 0 13 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin Younggun Lee Suhee Jo Yeongtae Hwang Taesu Kim 9 14 0 13 Jul 2022
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus Josh Meyer David Ifeoluwa Adelani Edresson Casanova A. Oktem Daniel Whitenack Julian Weber ... Victor Akinode Bernard Opoku S. Olanrewaju Jesujoba Oluwadara Alabi Shamsuddeen Hassan Muhammad 11 21 0 07 Jul 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion Yinjiao Lei Shan Yang Jian Cong Linfu Xie Dan Su DiffM 42 12 0 05 Jul 2022
GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion Magdalena Proszewska Grzegorz Beringer Daniel Sáez-Trigueros Thomas Merritt Abdelhamid Ezzerg Roberto Barra-Chicote 22 6 0 04 Jul 2022
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers Liumeng Xue Shan Yang Na Hu Dan Su Linfu Xie 16 2 0 02 Jul 2022