ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.06873
  4. Cited By
FastPitch: Parallel Text-to-speech with Pitch Prediction

FastPitch: Parallel Text-to-speech with Pitch Prediction

11 June 2020
Adrian Lañcucki
ArXivPDFHTML

Papers citing "FastPitch: Parallel Text-to-speech with Pitch Prediction"

50 / 173 papers shown
Title
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
39
0
0
01 May 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
S.
Mohammed Irfan Kurpath
Sahal Shaji Mullappilly
Jean Lahoud
Fahad A Khan
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
AuLLM
135
0
0
06 Mar 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
47
0
0
31 Dec 2024
Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech
  Synthesis in ASR
Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
24
0
0
16 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Anton Firc
K. Malinka
P. Hanáček
DiffM
31
0
0
09 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech
  Synthesis with Discrete Codec Modeling of EnGen-TTS
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
21
0
0
09 Oct 2024
Exploring synthetic data for cross-speaker style transfer in style
  representation based TTS
Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas Ueda
Leonardo B. de M. M. Marques
Flávio O. Simões
Mário Uliani Neto
Fernando Runstein
Bianca Dal Bó
Paula D. P. Costa
21
0
0
25 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
46
3
0
23 Sep 2024
A quest through interconnected datasets: lessons from highly-cited
  ICASSP papers
A quest through interconnected datasets: lessons from highly-cited ICASSP papers
Cynthia C. S. Liem
Doğa Taşcılar
Andrew M. Demetriou
17
0
0
19 Sep 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS
E1 TTS: Simple and Fast Non-Autoregressive TTS
Zhijun Liu
Shuai Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
VLM
DiffM
38
3
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
109
0
0
14 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
35
2
0
13 Sep 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Octavian Pascu
Dan Oneaţă
H. Cucu
Nicolas M. Muller
40
1
0
28 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
36
1
0
20 Aug 2024
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional
  Text-to-Speech
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Xin Qi
Ruibo Fu
Zhengqi Wen
Jianhua Tao
Shuchen Shi
...
Yuankun Xie
Yukun Liu
Guanjun Li
Xuefei Liu
Yongwei Li
27
1
0
20 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks
  at Scale
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
45
38
0
16 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OOD
DiffM
AI4TS
43
5
0
14 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
33
0
0
08 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
33
0
0
06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
40
0
0
05 Aug 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for
  Practical Applications through Low-Effort Data Strategies
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
Srija Anand
Praveena Varadhan
Ashwin Sankar
Giri Raju
Mitesh M. Khapra
40
1
0
18 Jul 2024
TTSDS -- Text-to-Speech Distribution Score
TTSDS -- Text-to-Speech Distribution Score
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
26
0
0
17 Jul 2024
Fine-Grained and Interpretable Neural Speech Editing
Fine-Grained and Interpretable Neural Speech Editing
Max Morrison
Cameron Churchwell
Nathan Pruyne
Bryan Pardo
47
3
0
07 Jul 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Tomoya Yanagita
...
Haotian Tan
Makoto Sakai
S. Sakti
Katsuhito Sudoh
Satoshi Nakamura
28
1
0
30 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
50
11
0
25 Jun 2024
Instruction Data Generation and Unsupervised Adaptation for Speech
  Language Models
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
Vahid Noroozi
Zhehuai Chen
Somshubra Majumdar
Steve Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
34
3
0
18 Jun 2024
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Zehua Kcriss Li
Meiying Melissa Chen
Yi Zhong
Pinxin Liu
Zhiyao Duan
34
0
0
15 Jun 2024
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual
  Text-to-Speech
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar
Nirmesh Shah
Sai Akarsh
Pankaj Wasnik
R. Shah
32
1
0
12 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
31
6
0
10 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
43
3
0
10 Jun 2024
Should you use a probabilistic duration model in TTS? Probably!
  Especially for spontaneous speech
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
Shivam Mehta
Harm Lameris
Rajiv Punmiya
Jonas Beskow
Éva Székely
G. Henter
23
1
0
08 Jun 2024
Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Ahad Jawaid
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
MoE
32
0
0
05 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
52
0
0
31 May 2024
Non-autoregressive real-time Accent Conversion model with voice cloning
Non-autoregressive real-time Accent Conversion model with voice cloning
Vladimir Nechaev
Sergey Kosyakov
32
1
0
21 May 2024
Exploring speech style spaces with language models: Emotional TTS
  without emotion labels
Exploring speech style spaces with language models: Emotional TTS without emotion labels
Shreeram Suresh Chandra
Zongyang Du
Berrak Sisman
38
2
0
18 May 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
36
4
0
30 Apr 2024
An RFP dataset for Real, Fake, and Partially fake audio detection
An RFP dataset for Real, Fake, and Partially fake audio detection
Abdulazeez Alali
George Theodorakopoulos
22
2
0
26 Apr 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using
  Hypernetworks
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Yingting Li
Rishabh Bhardwaj
Ambuj Mehrish
Bo Cheng
Soujanya Poria
38
2
0
06 Apr 2024
Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
Sai Akarsh
Vamshi Raghusimha
Anindita Mondal
Anil Vuppala
39
1
0
07 Mar 2024
Benchmarking Large Multimodal Models against Common Corruptions
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min-Bin Lin
MLLM
22
14
0
22 Jan 2024
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
Nicolas M. Muller
Piotr Kawa
Wei Herng Choong
Edresson Casanova
Eren Golge
Thorsten Muller
P. Syga
Philip Sperl
Konstantin Böttinger
31
35
0
17 Jan 2024
Incremental FastPitch: Chunk-based High Quality Text to Speech
Incremental FastPitch: Chunk-based High Quality Text to Speech
Muyang Du
Chuan Liu
Junjie Lai
15
0
0
03 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
42
75
0
25 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
24
21
0
22 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using
  Synthetic Data and Transfer learning
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
25
0
0
02 Dec 2023
Custom Data Augmentation for low resource ASR using Bark and
  Retrieval-Based Voice Conversion
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
Anand Kamble
Aniket Tathe
Suyash Kumbharkar
Atharva Bhandare
Anirban C. Mitra
14
1
0
24 Nov 2023
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Muhammad Ali Farooq
Dan Bigioi
Rishabh Jain
Wang Yao
Mariam Yiwere
Peter Corcoran
19
0
0
08 Nov 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust
  Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
21
24
0
08 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
  Learning
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
20
0
0
07 Nov 2023
1234
Next