Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.04558
Cited By
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
8 June 2020
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
50 / 754 papers shown
Title
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators
Wiebke Hutiri
Orestis Papakyriakopoulos
Alice Xiang
28
16
0
25 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
22
25
0
25 Jan 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
29
0
0
23 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min-Bin Lin
MLLM
27
14
0
22 Jan 2024
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis
R. Vinotha
D. Hepsiba
L. D. V. Anand
Deepak John Reji
13
1
0
22 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
39
1
0
16 Jan 2024
End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2
Aniket Tathe
Anand Kamble
Suyash Kumbharkar
Atharva Bhandare
Anirban C. Mitra
30
1
0
11 Jan 2024
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Kenichi Fujita
Hiroshi Sato
Takanori Ashihara
Hiroki Kanagawa
Marc Delcroix
Takafumi Moriya
Yusuke Ijima
31
8
0
10 Jan 2024
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
28
2
0
09 Jan 2024
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Xi Chen
Jiakun Pei
Liumeng Xue
Mingyang Zhang
36
4
0
07 Jan 2024
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Yejin Jeon
Yunsu Kim
Gary Geunbae Lee
32
2
0
04 Jan 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Semin Kim
Joun Yeop Lee
Nam Soo Kim
AI4TS
23
4
0
03 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
44
75
0
25 Dec 2023
Creating New Voices using Normalizing Flows
Piotr Bilinski
Thomas Merritt
Abdelhamid Ezzerg
Kamil Pokora
Sebastian Cygert
K. Yanagisawa
Roberto Barra-Chicote
Daniel Korzekwa
18
17
0
22 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
24
21
0
22 Dec 2023
Style Modeling for Multi-Speaker Articulation-to-Speech
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
26
8
0
21 Dec 2023
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Xueyuan Chen
Xi Wang
Shaofei Zhang
Lei He
Zhiyong Wu
Xixin Wu
Helen M. Meng
41
7
0
19 Dec 2023
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
37
16
0
19 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
23
8
0
17 Dec 2023
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
Yu Zhang
Rongjie Huang
Ruiqi Li
Jinzheng He
Yan Xia
Feiyang Chen
Xinyu Duan
Baoxing Huai
Zhou Zhao
VLM
16
17
0
17 Dec 2023
CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis
Yayue Deng
Jinlong Xue
Yukang Jia
Qifei Li
Yichen Han
Fengping Wang
Yingming Gao
Dengfeng Ke
Ya Li
30
7
0
16 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
27
26
0
15 Dec 2023
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism
Georgios Milis
P. Filntisis
A. Roussos
Petros Maragos
CVBM
34
2
0
11 Dec 2023
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Zehua Chen
Guande He
Kaiwen Zheng
Xu Tan
Jun Zhu
DiffM
53
21
0
06 Dec 2023
Detecting Voice Cloning Attacks via Timbre Watermarking
Chang-rui Liu
Jie Zhang
Tianwei Zhang
Xi Yang
Weiming Zhang
Neng H. Yu
25
28
0
06 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
25
12
0
05 Dec 2023
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Jihyun Lee
Yejin Jeon
Wonjun Lee
Yunsu Kim
Gary Geunbae Lee
15
1
0
04 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
25
0
0
02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Raviraj Joshi
Nikesh Garera
25
0
0
02 Dec 2023
Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices
Gokul Srinivasagan
Michael Deisher
Munir Georges
VLM
19
0
0
30 Nov 2023
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes
Pavel Korshunov
Haolin Chen
Philip N. Garner
S´ebastien Marcel
CVBM
43
4
0
29 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
27
31
0
21 Nov 2023
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Jiangzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
24
0
0
14 Nov 2023
SponTTS: modeling and transferring spontaneous style for TTS
Hanzhao Li
Xinfa Zhu
Liumeng Xue
Yang Song
Yunlin Chen
Lei Xie
19
7
0
13 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
20
0
0
07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Dongjune Lee
N. Kim
AI4TS
25
10
0
06 Nov 2023
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations
Hanglei Zhang
Yiwei Guo
Sen Liu
Xie Chen
Kai Yu
17
0
0
02 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
26
27
0
02 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
21
0
0
27 Oct 2023
Enabling Acoustic Audience Feedback in Large Virtual Events
Tamay Aykut
M. Hofbauer
Christopher B. Kuhn
Eckehard Steinbach
Bernd Girod
38
0
0
27 Oct 2023
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
15
2
0
26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
19
6
0
26 Oct 2023
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
Xinfa Zhu
Yuke Li
Yinjiao Lei
Ning Jiang
Guoqing Zhao
Lei Xie
23
0
0
26 Oct 2023
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control
Elif Bozkurt
34
0
0
25 Oct 2023
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Marek Kubis
Pawel Skórzewski
Marcin Sowañski
Tomasz Ziętkiewicz
11
6
0
25 Oct 2023
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
19
31
0
25 Oct 2023
Non-autoregressive Streaming Transformer for Simultaneous Translation
Zhengrui Ma
Shaolei Zhang
Shoutao Guo
Chenze Shao
Min Zhang
Yang Feng
24
12
0
23 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
39
0
0
23 Oct 2023
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
11
3
0
22 Oct 2023
Energy-Based Models For Speech Synthesis
Wanli Sun
Zehai Tu
Anton Ragni
DiffM
24
0
0
19 Oct 2023
Previous
1
2
3
4
5
6
...
14
15
16
Next