Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.11129
Cited By
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
22 May 2020
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search"
50 / 286 papers shown
Title
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
Detai Xin
Shinnosuke Takamichi
Ai Morimatsu
Hiroshi Saruwatari
8
10
0
21 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
26
20
0
18 May 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Siwei Lyu
30
38
0
25 Apr 2023
An End-to-End Neural Network for Image-to-Audio Transformation
Liu Chen
Michael Deisher
Munir Georges
16
3
0
10 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
26
7
0
06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
30
4
0
05 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model
Haolin Chen
Philip N. Garner
DiffM
31
1
0
03 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Ajinkya Kulkarni
Atharva Kulkarni
Sara Shatnawi
Hanan Aldarmaki
17
8
0
28 Feb 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
11
6
0
28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
26
27
0
27 Feb 2023
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
D. Yang
Tomoki Koriyama
Yuki Saito
Takaaki Saeki
Detai Xin
Hiroshi Saruwatari
10
7
0
27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
12
6
0
27 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Ehab AlBadawy
Siwei Lyu
120
3
0
18 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
6
35
0
08 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
H. Meng
DiffM
VLM
31
84
0
31 Jan 2023
DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion Models
Haomin Wen
Youfang Lin
Yutong Xia
Huaiyu Wan
Qingsong Wen
Roger Zimmermann
Yuxuan Liang
DiffM
23
81
0
31 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Hao Liu
Tao Wang
Ruibo Fu
Jiangyan Yi
Zhengqi Wen
J. Tao
18
3
0
10 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo P. Mandic
DiffM
22
22
0
30 Dec 2022
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
17
18
0
29 Dec 2022
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Yusuke Yasuda
T. Toda
DiffM
15
7
0
16 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Xiaorui Wang
Zhongyuan Wang
BDL
21
6
0
13 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Byoung Jin Choi
Myeonghun Jeong
Joun Yeop Lee
N. Kim
12
12
0
30 Nov 2022
Prosody-controllable spontaneous TTS with neural HMMs
Harm Lameris
Shivam Mehta
G. Henter
Joakim Gustafson
Éva Székely
27
15
0
24 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
34
18
0
17 Nov 2022
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation
Chunyu Qiang
Peng Yang
Hao Che
Jinba Xiao
Xiaorui Wang
Zhongyuan Wang
13
3
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
16
46
0
17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
Minki Kang
Dong Min
Sung Ju Hwang
DiffM
20
48
0
17 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
26
12
0
13 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee
Jaesung Bae
Seongkyu Mun
Heejin Choi
Joun Yeop Lee
Hoon-Young Cho
Chanwoo Kim
24
2
0
06 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Konstantinos Klapsas
Karolos Nikitaras
Nikolaos Ellinas
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
11
0
0
02 Nov 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Georgia Maniati
Panos Kakoulidis
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
16
2
0
31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis
Anusha Prakash
H. Murthy
13
0
0
31 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Masaya Kawamura
Yuma Shirahata
Ryuichi Yamamoto
Kentaro Tachibana
24
15
0
28 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
30
22
0
21 Oct 2022
Invertible Monotone Operators for Normalizing Flows
Byeongkeun Ahn
Chiyoon Kim
Youngjoon Hong
Hyunwoo J. Kim
TPM
30
8
0
15 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario
Emily R. Bartusiak
Edward J. Delp
19
12
0
14 Oct 2022
Can we use Common Voice to train a Multi-Speaker TTS system?
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
19
10
0
12 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
17
5
0
12 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Kyuhong Shim
Wonyong Sung
25
2
0
01 Oct 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Haohan Guo
Fenglong Xie
Frank Soong
Xixin Wu
Helen M. Meng
37
11
0
22 Sep 2022
Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale
Aditya Agarwal
Bipasha Sen
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
22
0
0
21 Aug 2022
G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model
Pan Xie
Qipeng Zhang
Zexian Li
Hao Tang
Yao Du
Xiaohui Hu
DiffM
36
12
0
19 Aug 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
44
193
0
13 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
28
10
0
13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate
Nabarun Goswami
Tatsuya Harada
10
5
0
13 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS
Yookyung Shin
Younggun Lee
Suhee Jo
Yeongtae Hwang
Taesu Kim
9
14
0
13 Jul 2022
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus
Josh Meyer
David Ifeoluwa Adelani
Edresson Casanova
A. Oktem
Daniel Whitenack Julian Weber
...
Victor Akinode
Bernard Opoku
S. Olanrewaju
Jesujoba Oluwadara Alabi
Shamsuddeen Hassan Muhammad
11
21
0
07 Jul 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Yinjiao Lei
Shan Yang
Jian Cong
Linfu Xie
Dan Su
DiffM
42
12
0
05 Jul 2022
GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion
Magdalena Proszewska
Grzegorz Beringer
Daniel Sáez-Trigueros
Thomas Merritt
Abdelhamid Ezzerg
Roberto Barra-Chicote
22
6
0
04 Jul 2022
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers
Liumeng Xue
Shan Yang
Na Hu
Dan Su
Linfu Xie
16
2
0
02 Jul 2022
Previous
1
2
3
4
5
6
Next