v1v2 (latest)

FastPitch: Parallel Text-to-speech with Pitch Prediction

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

11 June 2020

Papers citing "FastPitch: Parallel Text-to-speech with Pitch Prediction"

50 / 183 papers shown

Title
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS Junhyeok Lee Wonbin Jung Hyunjae Cho Jaeyeon Kim Jaehwan Kim 159 3 0 24 Feb 2023
Modular Deep Learning Jonas Pfeiffer Sebastian Ruder Ivan Vulić Edoardo Ponti MoMe OOD 257 92 0 22 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations Shehzeen Samarah Hussain Paarth Neekhara Jocelyn Huang Jason Chun Lok Li Boris Ginsburg 100 30 0 16 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS Rohan Badlani Rafael Valle Kevin J. Shih J. F. Santos Siddharth Gururani Bryan Catanzaro 103 7 0 24 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study Massa Baali Tomoki Hayashi Hamdy Mubarak Soumi Maiti Shinji Watanabe W. El-Hajj Ahmed M. Ali 94 11 0 22 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation Abdullah Shahid S. Latif Junaid Qadir 101 26 0 10 Jan 2023
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Shinhyeok Oh HyeongRae Noh Yoonseok Hong Insoo Oh 99 0 0 15 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations Nikolaos Ellinas Myrsini Christidou Alexandra Vioni June Sig Sung Aimilios Chalamandaris Pirros Tsiakoulis P. Mastorocostas 106 7 0 29 Nov 2022
Evaluating and reducing the distance between synthetic and real speech distributions Christoph Minixhofer Ondˇrej Klejch P. Bell 173 9 0 29 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 127 23 0 17 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing J. Webber Cassia Valentini-Botinhao Evelyn Williams G. Henter Simon King 148 10 0 13 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 168 15 0 13 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers Cheng-Ping Hsieh Subhankar Ghosh Boris Ginsburg 122 20 0 01 Nov 2022
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation Nobuyuki Morioka Heiga Zen Nanxin Chen Yu Zhang Yifan Ding 130 18 0 28 Oct 2022
Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao 75 1 0 25 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS Florian Lux Julia Koch Ngoc Thang Vu 146 25 0 21 Oct 2022
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion Yuta Matsunaga Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 124 1 0 18 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario Emily R. Bartusiak Edward J. Delp 101 14 0 14 Oct 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 117 6 0 22 Sep 2022
Detecting Synthetic Speech Manipulation in Real Audio Recordings M. Rahman M. Graciarena Diego Castán Chris Cobo-Kroenke Mitchell McLaren A. Lawson AAML 104 10 0 15 Sep 2022
Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features Jun Xue Cunhang Fan Zhao Lv J. Tao Jiangyan Yi C. Zheng Zhengqi Wen Minmin Yuan S. Shao 109 36 0 02 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation Giulia Comini Goeric Huybrechts M. Ribeiro Adam Gabry's Jaime Lorenzo-Trueba 86 6 0 29 Jul 2022
PoeticTTS -- Controllable Poetry Reading for Literary Studies Julia Koch Florian Lux Nadja Schauffler T. Bernhart Felix Dieterle Jonas Kuhn Sandra Richter Gabriel Viehhauser Ngoc Thang Vu 100 6 0 11 Jul 2022
Speaker Anonymization with Phonetic Intermediate Representations Sarina Meyer Florian Lux Pavel Denisov Julia Koch Pascal Tilli Ngoc Thang Vu 98 29 0 11 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Weinan Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 187 31 0 29 Jun 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion Dacheng Yin Chuanxin Tang Yanqing Liu Xiaoqiang Wang Zhiyuan Zhao Yucheng Zhao Zhiwei Xiong Sheng Zhao Chong Luo 117 14 0 28 Jun 2022
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech Florian Lux Julia Koch Ngoc Thang Vu 98 21 0 24 Jun 2022
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis Tae-Woo Kim Minguk Kang Gyeong-Hoon Lee AAML 207 8 0 23 Jun 2022
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History Yuto Nishimura Yuki Saito Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari AI4TS 93 8 0 16 Jun 2022
FlexLip: A Controllable Text-to-Lip System Dan Oneaţă Beáta Lőrincz Adriana Stan H. Cucu 87 5 0 07 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li Cong Han N. Mesgarani 158 51 0 30 May 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit Hui Zhang Tian Yuan Junkun Chen Xintong Li Renjie Zheng ... Zeyu Chen Xiaoguang Hu Dianhai Yu Yanjun Ma Liang Huang AuLLM 103 30 0 20 May 2022
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech Yongqian Li Cheng Yu Guangzhi Sun Hua Jiang Fanglei Sun Weiqin Zu Ying Wen Yang Yang Jun Wang 79 7 0 09 May 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond Yisheng Xiao Lijun Wu Junliang Guo Juntao Li Hao Fei Tao Qin Tie-Yan Liu 3DV MedIm AI4CE 146 101 0 20 Apr 2022
Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch Hanbin Bae Young-Sun Joo 77 2 0 12 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech Jaesung Bae Jinhyeok Yang Taejun Bak Young-Sun Joo DiffM 177 6 0 08 Apr 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022 Jiameng Gao 87 0 0 08 Apr 2022
Heterogeneous Target Speech Separation Hyunjae Cho Wonbin Jung Junhyeok Lee Paris Smaragdis Sanghyun Woo 120 29 0 07 Apr 2022
Text-To-Speech Data Augmentation for Low Resource Speech Recognition Rodolfo Zevallos 86 6 0 01 Apr 2022
Data-augmented cross-lingual synthesis in a teacher-student framework M. D. Korte Jaebok Kim A. Kunikoshi Adaeze Adigwe E. Klabbers 107 0 0 31 Mar 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis Hubert Siuzdak Piotr Dura Pol van Rijn Nori Jacoby AI4TS 217 34 0 31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech D. Lim Sunghee Jung Eesung Kim 175 59 0 31 Mar 2022
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus Minchan Kim Myeonghun Jeong Byoung Jin Choi Sunghwan Ahn Joun Yeop Lee N. Kim 134 28 0 29 Mar 2022
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent Yuki Saito Yuto Nishimura Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari 167 13 0 28 Mar 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Shiyin Kang Helen Meng 71 12 0 23 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features Florian Lux Ngoc Thang Vu 122 31 0 07 Mar 2022
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows Kevin J. Shih Rafael Valle Rohan Badlani J. F. Santos Bryan Catanzaro 119 4 0 03 Mar 2022
Revisiting Over-Smoothness in Text to Speech Yi Ren Xu Tan Tao Qin Zhou Zhao Tie-Yan Liu 168 66 0 26 Feb 2022
Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question Wailing Ng Raymond Chi-Wing Wong Xuefang Zhao Chen Zhang 92 14 0 04 Jan 2022
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021 A. P. D. Martos Albert Sanchis Alfons Juan-Císcar 122 6 0 29 Oct 2021