v1v2 (latest)

FastPitch: Parallel Text-to-speech with Pitch Prediction

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

11 June 2020

Papers citing "FastPitch: Parallel Text-to-speech with Pitch Prediction"

50 / 183 papers shown

Title
HuLA: Prosody-Aware Anti-Spoofing with Multi-Task Learning for Expressive and Emotional Synthetic Speech Aurosweta Mahapatra Ismail Rasim Ulgen Berrak Sisman 0 0 0 25 Sep 2025
SEA-Spoof: Bridging The Gap in Multilingual Audio Deepfake Detection for South-East Asian Jinyang Wu Nana Hou Zihan Pan Qiquan Zhang Sailor Hardik Bhupendra Soumik Mondal 8 0 0 24 Sep 2025
Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching Siratish Sakpiboonchit 12 0 0 10 Sep 2025
Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning Yejin Jeon Solee Im Youngjae Kim G. G. Lee 16 1 0 14 Aug 2025
MultiGen: Child-Friendly Multilingual Speech Generator with LLMs Xiaoxue Gao Huayun Zhang Nancy F. Chen 32 0 0 12 Aug 2025
Adaptive Duration Model for Text Speech Alignment Junjie Cao 40 0 0 30 Jul 2025
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion Yu Zhang Baotong Tian Z. Duan 189 0 0 19 Jul 2025
Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes Zhou Feng Jiahao Chen Chunyi Zhou Yuwen Pu Qingming Li Tianyu Du S. Ji AAML 44 1 0 17 Jul 2025
EmojiVoice: Towards long-term controllable expressivity in robot speech Paige Tuttosi Shivam Mehta Zachary Syvenky Bermet Burkanova G. Henter Angelica Lim 98 1 0 18 Jun 2025
GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor Seokgi Lee Jungjun Kim TTA 147 0 0 26 May 2025
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution Anton Firc Manasi Chibber Jagabandhu Mishra Vishwanath Pratap Singh Tomi Kinnunen K. Malinka 245 2 0 26 May 2025
Voice Cloning: Comprehensive Survey Hussam Azzuni Abdulmotaleb El Saddik VLM 184 2 0 01 May 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR Sewade Ogun Vincent Colotte Emmanuel Vincent 142 1 0 11 Mar 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Siyang Song Mohammed Irfan Kurpath Sahal Shaji Mullappilly Jean Lahoud Fahad A Khan Rao Muhammad Anwer Salman Khan Hisham Cholakkal AuLLM 490 3 0 06 Mar 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation Ji-Hoon Kim Hong-Sun Yang Yoon-Cheol Ju Il-Hwan Kim Byeong-Yeol Kim Joon Son Chung BDL 172 0 0 31 Dec 2024
Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR Christoph Minixhofer Ondˇrej Klejch Peter Bell 97 0 0 16 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset Anton Firc K. Malinka P. Hanáček DiffM 111 5 0 09 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS Onkar Kishor Susladkar Vishesh Tripathi Biddwan Ahmed 71 0 0 09 Oct 2024
Exploring synthetic data for cross-speaker style transfer in style representation based TTS Lucas Ueda Leonardo B. de M. M. Marques Flávio O. Simões Mário Uliani Neto Fernando Runstein Bianca Dal Bó Paula D. P. Costa 116 0 0 25 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection Lam Pham Phat Lam Dat Tran Hieu Tang Tin Nguyen Alexander Schindler Canh Vu Alexander Polonsky Canh Vu 223 7 0 23 Sep 2024
A quest through interconnected datasets: lessons from highly-cited ICASSP papers Cynthia C. S. Liem Doğa Taşcılar Andrew M. Demetriou 96 0 0 19 Sep 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS Zhijun Liu Shuai Wang Pengcheng Zhu Mengxiao Bi Haizhou Li VLM DiffM 117 4 0 14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation C. Han Seokgi Lee Gyuhyeon Nam Gyeongsu Chae DiffM 684 0 0 14 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset Jiawei Du I-Ming Lin I-Hsiang Chiu Xuanjun Chen Haibin Wu Wenze Ren Yu Tsao Hung-yi Lee Jyh-Shing Roger Jang DiffM 107 12 0 13 Sep 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection Octavian Pascu Dan Oneaţă H. Cucu Nicolas M. Muller 132 3 0 28 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge Johan Rohdin Lin Zhang Oldřich Plchot Vojtěch Staněk David Mihola ... Themos Stafylakis Dmitriy Beveraki Anna Silnova Jan Brukner Lukáš Burget 111 7 0 20 Aug 2024
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech Xin Qi Ruibo Fu Zhengqi Wen Jianhua Tao Shuchen Shi ... Yuankun Xie Yukun Liu Guanjun Li Zhengqi Wen Yongwei Li 85 2 0 20 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale Xin Wang Héctor Delgado Hemlata Tak Jee-weon Jung Hye-jin Shim ... Md. Sahidullah Tomi Kinnunen Nicholas W. D. Evans K. Lee Junichi Yamagishi AAML 135 93 0 16 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Sang-Hoon Lee Ha-Yeong Choi Seong-Whan Lee OOD DiffM AI4TS 161 7 0 14 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion Elyas Rashno Amir Eskandari Aman Anand F. Zulkernine MedIm 127 4 0 08 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training Hawraz A. Ahmad Tarik A. Rashid 153 0 0 06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG Thibault Gaudier Marie Tahon Anthony Larcher Yannick Esteve 102 0 0 05 Aug 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies Srija Anand Praveena Varadhan Ashwin Sankar Giri Raju Mitesh M. Khapra 80 2 0 18 Jul 2024
TTSDS -- Text-to-Speech Distribution Score Christoph Minixhofer Ondˇrej Klejch Peter Bell 126 4 0 17 Jul 2024
Fine-Grained and Interpretable Neural Speech Editing Max Morrison Cameron Churchwell Nathan Pruyne Bryan Pardo 125 7 0 07 Jul 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024 Yuka Ko Ryo Fukuda Yuta Nishikawa Yasumasa Kano Tomoya Yanagita ... Haotian Tan Makoto Sakai S. Sakti Katsuhito Sudoh Satoshi Nakamura 154 1 0 30 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment Paarth Neekhara Shehzeen Samarah Hussain Subhankar Ghosh Jason Chun Lok Li Rafael Valle Rohan Badlani Boris Ginsburg 108 20 0 25 Jun 2024
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models Vahid Noroozi Zhehuai Chen Somshubra Majumdar Steve Huang Jagadeesh Balam Boris Ginsburg SyDa 173 5 0 18 Jun 2024
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis Zehua Kcriss Li Meiying Melissa Chen Yi Zhong Pinxin Liu Zhiyao Duan 62 2 0 15 Jun 2024
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech Ashishkumar Gudmalwar Nirmesh Shah Sai Akarsh Pankaj Wasnik R. Shah 87 4 0 12 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts Thomas Bott Florian Lux Ngoc Thang Vu 136 10 0 10 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages Florian Lux Sarina Meyer Lyonel Behringer Frank Zalkow P. Do Matt Coler Emanuel Habets Ngoc Thang Vu CLIP 138 10 0 10 Jun 2024
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech Shivam Mehta Harm Lameris Rajiv Punmiya Jonas Beskow Éva Székely G. Henter 93 5 0 08 Jun 2024
Style Mixture of Experts for Expressive Text-To-Speech Synthesis Ahad Jawaid Shreeram Suresh Chandra Junchen Lu Berrak Sisman MoE 134 4 0 05 Jun 2024
A Survey of Deep Learning Audio Generation Methods Matej Bozic Marko Horvat VLM MedIm 145 5 0 31 May 2024
Non-autoregressive real-time Accent Conversion model with voice cloning Vladimir Nechaev Sergey Kosyakov 104 2 0 21 May 2024
Exploring speech style spaces with language models: Emotional TTS without emotion labels Shreeram Suresh Chandra Zongyang Du Berrak Sisman 110 3 0 18 May 2024
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Shivam Mehta Anna Deichler Jim O'Regan Birger Moëll Jonas Beskow G. Henter Simon Alexanderson 132 6 0 30 Apr 2024
An RFP dataset for Real, Fake, and Partially fake audio detection Abdulazeez Alali George Theodorakopoulos 103 4 0 26 Apr 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks Yingting Li Rishabh Bhardwaj Ambuj Mehrish Bo Cheng Soujanya Poria 99 2 0 06 Apr 2024