ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04558
  4. Cited By
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

8 June 2020
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
ArXivPDFHTML

Papers citing "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

50 / 754 papers shown
Title
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
32
1
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
57
33
0
29 Aug 2024
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained
  Controllable Text-to-Speech
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech
Haowei Lou
Helen Paik
Wen Hu
Lina Yao
VLM
31
0
0
27 Aug 2024
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural
  Language Description
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description
Zeyu Jin
Jia Jia
Qixin Wang
Kehan Li
Shuoyi Zhou
Songtao Zhou
Xiaoyu Qin
Zhiyong Wu
27
10
0
24 Aug 2024
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion
  of Whispered and Regular Speech
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
22
0
0
21 Aug 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
32
1
0
20 Aug 2024
Content and Style Aware Audio-Driven Facial Animation
Content and Style Aware Audio-Driven Facial Animation
Qingju Liu
Hyeongwoo Kim
Gaurav Bharaj
DiffM
32
1
0
13 Aug 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis
  Vocoders
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
23
0
0
13 Aug 2024
PRESENT: Zero-Shot Text-to-Prosody Control
PRESENT: Zero-Shot Text-to-Prosody Control
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
Dorien Herremans
43
0
0
13 Aug 2024
An Investigation Into Explainable Audio Hate Speech Detection
An Investigation Into Explainable Audio Hate Speech Detection
Jinmyeong An
Wonjun Lee
Yejin Jeon
Jungseul Ok
Yunsu Kim
Gary Geunbae Lee
26
2
0
12 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
33
0
0
08 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
35
2
0
08 Aug 2024
Synchronous Multi-modal Semantic Communication System with Packet-level
  Coding
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Yun Tian
Jingkai Ying
Zhijin Qin
Ye Jin
Xiaoming Tao
34
3
0
08 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
33
0
0
06 Aug 2024
Generative Expressive Conversational Speech Synthesis
Generative Expressive Conversational Speech Synthesis
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
56
5
0
31 Jul 2024
On the Problem of Text-To-Speech Model Selection for Synthetic Data
  Generation in Automatic Speech Recognition
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach
Ralf Schluter
S. Sakti
27
2
0
31 Jul 2024
Towards Improving NAM-to-Speech Synthesis Intelligibility using
  Self-Supervised Speech Models
Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
N. Shah
Shirish S. Karande
Vineet Gandhi
25
1
0
26 Jul 2024
On the Effect of Purely Synthetic Training Data for Different Automatic
  Speech Recognition Architectures
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
25
1
0
25 Jul 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
29
0
0
24 Jul 2024
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery
  Detection and Localization
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
Junyan Wu
Wei Lu
Xiangyang Luo
Rui Yang
Qian Wang
Xiaochun Cao
34
3
0
23 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
35
4
0
22 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning
  of CLIP and Fastspeech2
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
28
0
0
19 Jul 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous
  Behaviors Based on Language Models
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
Weiqin Li
Pei-Yin Yang
Yicheng Zhong
Yixuan Zhou
Zhisheng Wang
Zhiyong Wu
Xixin Wu
Helen M. Meng
30
3
0
18 Jul 2024
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural
  Network
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
Kexin Wang
Jiahong Zhang
Yong Ren
Man Yao
Richard D. Shang
Boxing Xu
Guoqi Li
DiffM
21
2
0
17 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
29
11
0
08 Jul 2024
Fine-Grained and Interpretable Neural Speech Editing
Fine-Grained and Interpretable Neural Speech Editing
Max Morrison
Cameron Churchwell
Nathan Pruyne
Bryan Pardo
47
3
0
07 Jul 2024
Probing the Feasibility of Multilingual Speaker Anonymization
Probing the Feasibility of Multilingual Speaker Anonymization
Sarina Meyer
Florian Lux
Ngoc Thang Vu
39
3
0
03 Jul 2024
Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
Kenichi Fujita
Takanori Ashihara
Marc Delcroix
Yusuke Ijima
42
2
0
01 Jul 2024
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech
  Synthesis
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Yinlin Guo
Yening Lv
Jinqiao Dou
Yan Zhang
Yuehai Wang
18
0
0
30 Jun 2024
Open-Source Conversational AI with SpeechBrain 1.0
Open-Source Conversational AI with SpeechBrain 1.0
Mirco Ravanelli
Titouan Parcollet
Adel Moumen
Sylvain de Langen
Cem Subakan
...
Salima Mdhaffar
G. Laperriere
Mickael Rouvier
Renato De Mori
Yannick Esteve
VLM
34
10
0
29 Jun 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling
  on Time Variability
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Hyun Joon Park
Jin Sob Kim
Wooseok Shin
Sung Won Han
DiffM
33
2
0
27 Jun 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
35
47
0
26 Jun 2024
Exploring the Capability of Mamba in Speech Applications
Exploring the Capability of Mamba in Speech Applications
Koichi Miyazaki
Yoshiki Masuyama
Masato Murata
Mamba
35
12
0
24 Jun 2024
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake
  Detection
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Hyun Myung Kim
Kangwook Jang
Hoirin Kim
29
5
0
24 Jun 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot
  Speaker Adaptive Text-to-Speech
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang
Yang Song
Sanjay Jha
36
5
0
21 Jun 2024
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS
  Prediction
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
Yuxun Tang
Jiatong Shi
Yuning Wu
Qin Jin
29
9
0
16 Jun 2024
Phoneme Discretized Saliency Maps for Explainable Detection of
  AI-Generated Voice
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
Shubham Gupta
Mirco Ravanelli
Pascal Germain
Cem Subakan
FAtt
37
3
0
14 Jun 2024
End-to-end Streaming model for Low-Latency Speech Anonymization
End-to-end Streaming model for Low-Latency Speech Anonymization
Waris Quamer
Ricardo Gutierrez-Osuna
20
0
0
13 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech
  Synthesis
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao
Daxin Tan
Y. Yeung
Xiao Chen
Tan Lee
35
3
0
13 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
35
2
0
13 Jun 2024
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with
  Paralanguage
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
Kyra Wang
Dorien Herremans
19
0
0
13 Jun 2024
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based
  Text-to-Speech for Dubbing
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn
Ashishkumar Gudmalwar
Nirmesh Shah
Pankaj Wasnik
R. Shah
43
5
0
13 Jun 2024
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical
  Emotion Vector for Controllable Emotional Text-to-Speech
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Sang-Hoon Lee
Seong-Whan Lee
35
7
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
43
15
0
11 Jun 2024
Can We Achieve High-quality Direct Speech-to-Speech Translation without
  Parallel Speech Data?
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang
Shaolei Zhang
Zhengrui Ma
Min Zhang
Yang Feng
VLM
37
1
0
11 Jun 2024
A Non-autoregressive Generation Framework for End-to-End Simultaneous
  Speech-to-Any Translation
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Zhengrui Ma
Qingkai Fang
Shaolei Zhang
Shoutao Guo
Yang Feng
Min Zhang
53
9
0
11 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
33
6
0
10 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
43
3
0
10 Jun 2024
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Bingsong Bai
Fengping Wang
Yingming Gao
Ya Li
46
0
0
09 Jun 2024
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding
  of Sound and Language
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Mark Hamilton
Andrew Zisserman
John R. Hershey
William T. Freeman
VLM
39
5
0
09 Jun 2024
Previous
123456...141516
Next