v1v2v3 (latest)

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

12 May 2020

ArXiv (abs)PDF HTML Github (898★)

Papers citing "Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis"

50 / 84 papers shown

Title
Adaptive Duration Model for Text Speech Alignment Junjie Cao 36 0 0 30 Jul 2025
When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds Minsu Kang Seolhee Lee Choonghyeon Lee Namhyun Cho VLM 64 0 0 30 May 2025
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget$ Xin Li Kaikai Jia Hao Sun Jun Dai Z. L. Jiang 509 0 0 27 Apr 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation Alexander H. Liu Sang-gil Lee Chao-Han Huck Yang Yuan Gong Yu-Chun Wang James Glass Rafael Valle Bryan Catanzaro SSL 144 3 0 02 Mar 2025
Disentangling segmental and prosodic factors to non-native speech comprehensibility Waris Quamer Ricardo Gutierrez-Osuna 110 1 0 20 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning Shuai Wang Zheng-Shou Chen Kong Aik Lee Yan-min Qian Haizhou Li 145 15 0 21 Jul 2024
An Attribute Interpolation Method in Speech Synthesis by Model Merging Masato Murata Koichi Miyazaki Tomoki Koriyama MoMe 136 8 0 30 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment Paarth Neekhara Shehzeen Samarah Hussain Subhankar Ghosh Jason Chun Lok Li Rafael Valle Rohan Badlani Boris Ginsburg 108 17 0 25 Jun 2024
Cognitively Inspired Energy-Based World Models Alexi Gladstone Ganesh Nanduru Md. Mofijul Islam Aman Chadha Jundong Li Tariq Iqbal 100 0 0 13 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Zhijun Liu Shuai Wang Sho Inoue Qibing Bai Haizhou Li DiffM 111 24 0 08 Jun 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks Yingting Li Rishabh Bhardwaj Ambuj Mehrish Bo Cheng Soujanya Poria 91 2 0 06 Apr 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Xiang Li Fan Bu Ambuj Mehrish Yingting Li Jiale Han Bo Cheng Soujanya Poria DiffM 93 7 0 31 Mar 2024
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model Xiangyu Zhang Daijiao Liu Hexin Liu Qiquan Zhang Hanyu Meng Leibny Paola García Chng Eng Siong Lina Yao DiffM 90 7 0 16 Feb 2024
Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech Abhinav Garg Jiyeon Kim Sushil Khyalia Chanwoo Kim Dhananjaya N. Gowda 93 3 0 19 Jan 2024
AE-Flow: AutoEncoder Normalizing Flow Jakub Mosiński Piotr Bilinski Thomas Merritt Abdelhamid Ezzerg Daniel Korzekwa 89 4 0 27 Dec 2023
Creating New Voices using Normalizing Flows Piotr Bilinski Thomas Merritt Abdelhamid Ezzerg Kamil Pokora Sebastian Cygert K. Yanagisawa Roberto Barra-Chicote Daniel Korzekwa 82 16 0 22 Dec 2023
Generative Pre-training for Speech with Flow Matching Alexander H. Liu Matt Le Apoorv Vyas Bowen Shi Andros Tjandra Wei-Ning Hsu 165 47 0 25 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo Hyukhun Koh Kyomin Jung DiffM 123 0 0 23 Oct 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching Yiwei Guo Chenpeng Du Ziyang Ma Xie Chen K. Yu DiffM 160 51 0 10 Sep 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation Minsu Kim J. Choi Dahun Kim Y. Ro 110 15 0 03 Aug 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design Jungil Kong Jihoon Park Beomjeong Kim Jeongmin Kim Dohee Kong Sangjin Kim 89 48 0 31 Jul 2023
HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising Li Pang Weizhen Gu Xiangyong Cao Xiangyu Rui Jiangjun Peng Shuang Xu Gang Yang Deyu Meng 151 0 0 20 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding Chenpeng Du Yiwei Guo Feiyu Shen Zhijun Liu Zheng Liang Xie Chen Shuai Wang Hui Zhang K. Yu DiffM 203 49 0 13 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading Yochai Yemini Aviv Shamsian Lior Bracha Sharon Gannot Ethan Fetaya DiffM 158 19 0 05 Jun 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus Yuma Koizumi Heiga Zen Shigeki Karita Yifan Ding Kohei Yatabe Nobuyuki Morioka M. Bacchiani Yu Zhang Wei Han Ankur Bapna 137 99 0 30 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS Sewade Ogun Vincent Colotte Emmanuel Vincent DiffM 66 5 0 28 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech Xin Jing Yi Chang Zijiang Yang Jiang-jian Xie Andreas Triantafyllopoulos Bjoern W. Schuller 130 11 0 22 May 2023
IMAD: IMage-Augmented multi-modal Dialogue Viktor Moskvoretskii Anton Frolov Denis Kuznetsov 101 5 0 17 May 2023
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers Rong Wu Wanchao Su Kede Ma Jing Liao 190 46 0 27 Apr 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts Chengzhe Sun Shan Jia Shuwei Hou Siwei Lyu 84 57 0 25 Apr 2023
Affective social anthropomorphic intelligent system Md. Adyelullahil Mamun Hasnat Md. Abdullah Md. Golam Rabiul Alam Muhammad Mehedi Hassan Md. Zia Uddin 57 1 0 19 Apr 2023
VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation Rohan Badlani Akshit Arora Subhankar Ghosh Rafael Valle Kevin J. Shih J. F. Santos Boris Ginsburg Bryan Catanzaro 46 4 0 14 Mar 2023
Do Prosody Transfer Models Transfer Prosody? A. Sigurgeirsson Simon King DiffM 85 10 0 07 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model Haolin Chen Philip N. Garner DiffM 75 1 0 03 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations N. Shah Saiteja Kosgi Vishal Tambrahalli Neha Sahipjohn Anil Nelakanti Vineet Gandhi 118 8 0 01 Mar 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow Yoonhyung Lee Jinhyeok Yang Kyomin Jung 79 7 0 27 Feb 2023
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition Leyuan Qu C. Weber S. Wermter 85 10 0 20 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts Chengzhe Sun Shan Jia Shuwei Hou Ehab AlBadawy Siwei Lyu 181 3 0 18 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS Rohan Badlani Rafael Valle Kevin J. Shih J. F. Santos Siddharth Gururani Bryan Catanzaro 99 7 0 24 Jan 2023
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 156 15 0 13 Nov 2022
Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation Runbang Zhang Yixiao Zhang Kai Shao Ying Shan Gus Xia 79 5 0 10 Nov 2022
Cover Reproducible Steganography via Deep Generative Models Kejiang Chen Hang Zhou Yaofei Wang Meng Li Weiming Zhang Neng H. Yu DiffM 93 15 0 26 Oct 2022
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models Aya Watanabe Shinnosuke Takamichi Yuki Saito Detai Xin Hiroshi Saruwatari 85 4 0 18 Oct 2022
Detecting Synthetic Speech Manipulation in Real Audio Recordings M. Rahman M. Graciarena Diego Castán Chris Cobo-Kroenke Mitchell McLaren A. Lawson AAML 96 10 0 15 Sep 2022
The Role of Vocal Persona in Natural and Synthesized Speech Camille Noufi Lloyd May J. Berger 91 2 0 06 Sep 2022
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation Dong Min Min-Hwan Song Eunji Ko Sung Ju Hwang VGen 129 14 0 23 Aug 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 102 12 0 13 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin Younggun Lee Suhee Jo Yeongtae Hwang Taesu Kim 117 15 0 13 Jul 2022
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus Josh Meyer David Ifeoluwa Adelani Edresson Casanova A. Oktem Daniel Whitenack Julian Weber ... Victor Akinode Bernard Opoku S. Olanrewaju Jesujoba Oluwadara Alabi Shamsuddeen Hassan Muhammad 59 24 0 07 Jul 2022
Expressive, Variable, and Controllable Duration Modelling in TTS Ammar Abbas Thomas Merritt Alexis Moinet S. Karlapati Ewa Muszyñska Simon Slangen Elia Gatti Thomas Drugman 73 12 0 28 Jun 2022