ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04558
  4. Cited By
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

8 June 2020
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
ArXivPDFHTML

Papers citing "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

50 / 754 papers shown
Title
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech
  Generators
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators
Wiebke Hutiri
Orestis Papakyriakopoulos
Alice Xiang
28
16
0
25 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
22
25
0
25 Jan 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
  Self-Supervised Representation Mixing and Embedding Initialization
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
29
0
0
23 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min-Bin Lin
MLLM
27
14
0
22 Jan 2024
Empowering Communication: Speech Technology for Indian and Western
  Accents through AI-powered Speech Synthesis
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis
R. Vinotha
D. Hepsiba
L. D. V. Anand
Deepak John Reji
13
1
0
22 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
39
1
0
16 Jan 2024
End to end Hindi to English speech conversion using Bark, mBART and a
  finetuned XLSR Wav2Vec2
End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2
Aniket Tathe
Anand Kamble
Suyash Kumbharkar
Atharva Bhandare
Anirban C. Mitra
30
1
0
11 Jan 2024
Noise-robust zero-shot text-to-speech synthesis conditioned on
  self-supervised speech-representation model with adapters
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Kenichi Fujita
Hiroshi Sato
Takanori Ashihara
Hiroki Kanagawa
Marc Delcroix
Takafumi Moriya
Yusuke Ijima
31
8
0
10 Jan 2024
SonicVisionLM: Playing Sound with Vision Language Models
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
28
2
0
09 Jan 2024
Transfer the linguistic representations from TTS to accent conversion
  with non-parallel data
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Xi Chen
Jiakun Pei
Liumeng Xue
Mingyang Zhang
36
4
0
07 Jan 2024
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker
  Representations
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Yejin Jeon
Yunsu Kim
Gary Geunbae Lee
32
2
0
04 Jan 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic
  Token Prediction
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Semin Kim
Joun Yeop Lee
Nam Soo Kim
AI4TS
23
4
0
03 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
44
75
0
25 Dec 2023
Creating New Voices using Normalizing Flows
Creating New Voices using Normalizing Flows
Piotr Bilinski
Thomas Merritt
Abdelhamid Ezzerg
Kamil Pokora
Sebastian Cygert
K. Yanagisawa
Roberto Barra-Chicote
Daniel Korzekwa
18
17
0
22 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
24
21
0
22 Dec 2023
Style Modeling for Multi-Speaker Articulation-to-Speech
Style Modeling for Multi-Speaker Articulation-to-Speech
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
26
8
0
21 Dec 2023
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based
  Pre-training for Expressive Audiobook Speech Synthesis
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Xueyuan Chen
Xi Wang
Shaofei Zhang
Lei He
Zhiyong Wu
Xixin Wu
Helen M. Meng
41
7
0
19 Dec 2023
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
  Graph-Based Context Modeling
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
37
16
0
19 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive
  Text-to-Speech Synthesis
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
23
8
0
17 Dec 2023
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
Yu Zhang
Rongjie Huang
Ruiqi Li
Jinzheng He
Yan Xia
Feiyang Chen
Xinyu Duan
Baoxing Huai
Zhou Zhao
VLM
16
17
0
17 Dec 2023
CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate
  Prosody in Conversational Speech Synthesis
CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis
Yayue Deng
Jinlong Xue
Yukang Jia
Qifei Li
Yichen Han
Fengping Wang
Yingming Gao
Dengfeng Ke
Ya Li
30
7
0
16 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
27
26
0
15 Dec 2023
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech
  Synthesis achieving both Auditory and Photo-realism
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism
Georgios Milis
P. Filntisis
A. Roussos
Petros Maragos
CVBM
34
2
0
11 Dec 2023
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Zehua Chen
Guande He
Kaiwen Zheng
Xu Tan
Jun Zhu
DiffM
53
21
0
06 Dec 2023
Detecting Voice Cloning Attacks via Timbre Watermarking
Detecting Voice Cloning Attacks via Timbre Watermarking
Chang-rui Liu
Jie Zhang
Tianwei Zhang
Xi Yang
Weiming Zhang
Neng H. Yu
25
28
0
06 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
25
12
0
05 Dec 2023
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue
  State Tracking
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Jihyun Lee
Yejin Jeon
Wonjun Lee
Yunsu Kim
Gary Geunbae Lee
15
1
0
04 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using
  Synthetic Data and Transfer learning
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
25
0
0
02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Raviraj Joshi
Nikesh Garera
25
0
0
02 Dec 2023
Compression of end-to-end non-autoregressive image-to-speech system for
  low-resourced devices
Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices
Gokul Srinivasagan
Michael Deisher
Munir Georges
VLM
19
0
0
30 Nov 2023
Vulnerability of Automatic Identity Recognition to Audio-Visual
  Deepfakes
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes
Pavel Korshunov
Haolin Chen
Philip N. Garner
S´ebastien Marcel
CVBM
43
4
0
29 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
27
31
0
21 Nov 2023
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized
  Representation
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Jiangzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
24
0
0
14 Nov 2023
SponTTS: modeling and transferring spontaneous style for TTS
SponTTS: modeling and transferring spontaneous style for TTS
Hanzhao Li
Xinfa Zhu
Liumeng Xue
Yang Song
Yunlin Chen
Lei Xie
19
7
0
13 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
  Learning
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
20
0
0
07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic
  Token Prediction
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Dongjune Lee
N. Kim
AI4TS
25
10
0
06 Nov 2023
Expressive TTS Driven by Natural Language Prompts Using Few Human
  Annotations
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations
Hanglei Zhang
Yiwei Guo
Sen Liu
Xie Chen
Kai Yu
17
0
0
02 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
26
27
0
02 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer
  Normalization based Diffusion GAN
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
21
0
0
27 Oct 2023
Enabling Acoustic Audience Feedback in Large Virtual Events
Enabling Acoustic Audience Feedback in Large Virtual Events
Tamay Aykut
M. Hofbauer
Christopher B. Kuhn
Eckehard Steinbach
Bernd Girod
38
0
0
27 Oct 2023
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
15
2
0
26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
19
6
0
26 Oct 2023
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised
  Contrastive Learning
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
Xinfa Zhu
Yuke Li
Yinjiao Lei
Ning Jiang
Guoqing Zhao
Lei Xie
23
0
0
26 Oct 2023
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with
  Style Control
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control
Elif Bozkurt
34
0
0
25 Oct 2023
Back Transcription as a Method for Evaluating Robustness of Natural
  Language Understanding Models to Speech Recognition Errors
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Marek Kubis
Pawel Skórzewski
Marcin Sowañski
Tomasz Ziętkiewicz
11
6
0
25 Oct 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
19
31
0
25 Oct 2023
Non-autoregressive Streaming Transformer for Simultaneous Translation
Non-autoregressive Streaming Transformer for Simultaneous Translation
Zhengrui Ma
Shaolei Zhang
Shoutao Guo
Chenze Shao
Min Zhang
Yang Feng
24
12
0
23 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal
  point processes
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
39
0
0
23 Oct 2023
An overview of text-to-speech systems and media applications
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
11
3
0
22 Oct 2023
Energy-Based Models For Speech Synthesis
Energy-Based Models For Speech Synthesis
Wanli Sun
Zehai Tu
Anton Ragni
DiffM
24
0
0
19 Oct 2023
Previous
123456...141516
Next