ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.11129
  4. Cited By
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

22 May 2020
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
ArXivPDFHTML

Papers citing "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search"

50 / 286 papers shown
Title
Expressive, Variable, and Controllable Duration Modelling in TTS
Expressive, Variable, and Controllable Duration Modelling in TTS
Ammar Abbas
Thomas Merritt
Alexis Moinet
S. Karlapati
Ewa Muszyñska
Simon Slangen
Elia Gatti
Thomas Drugman
22
10
0
28 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many
  Fine-Grained Prosody Transfer
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
S. Karlapati
Penny Karanasou
Mateusz Lajszczak
Ammar Abbas
Alexis Moinet
Peter Makarov
Raymond Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
14
15
0
27 Jun 2022
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech
Florian Lux
Julia Koch
Ngoc Thang Vu
32
19
0
24 Jun 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking
  Styles Using Spontaneous Dialogue
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Kentaro Mitsui
Tianyu Zhao
Kei Sawada
Yukiya Hono
Yoshihiko Nankaku
K. Tokuda
20
14
0
24 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
50
525
0
13 Jun 2022
Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
Alon Levkovitch
Eliya Nachmani
Lior Wolf
DiffM
19
29
0
05 Jun 2022
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
  Text-to-Speech
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech
Ziyue Jiang
Zhe Su
Zhou Zhao
Qian Yang
Yi Ren
Jinglin Liu
Zhe Ye
24
4
0
05 Jun 2022
Preparing an Endangered Language for the Digital Age: The Case of
  Judeo-Spanish
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish
A. Oktem
Rodolfo Zevallos
Yasmin Moslem
Günes Öztürk
Karen Sarhon
18
0
0
31 May 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
33
38
0
30 May 2022
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech
  with Untranscribed Data
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
Sungwon Kim
Heeseung Kim
Sung-Hoon Yoon
DiffM
196
52
0
30 May 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain
  Text-to-Speech
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
115
34
0
15 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
38
211
0
09 May 2022
Regotron: Regularizing the Tacotron2 architecture via monotonic
  alignment loss
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss
Efthymios Georgiou
Kosmas Kritsis
Georgios Paraskevopoulos
Athanasios Katsamanis
V. Katsouros
Alexandros Potamianos
16
3
0
28 Apr 2022
SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech
SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech
Zhenhui Ye
Zhou Zhao
Yi Ren
Fei Wu
21
27
0
25 Apr 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech
  Synthesis
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Rongjie Huang
Max W. Y. Lam
J. Wang
Dan Su
Dong Yu
Yi Ren
Zhou Zhao
DiffM
28
165
0
21 Apr 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation
  and Beyond
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
Yisheng Xiao
Lijun Wu
Junliang Guo
Juntao Li
M. Zhang
Tao Qin
Tie-Yan Liu
3DV
MedIm
AI4CE
30
82
0
20 Apr 2022
Music Source Separation with Generative Flow
Music Source Separation with Generative Flow
Ge Zhu
Jordan Darefsky
Fei Jiang
A. Selitskiy
Z. Duan
15
6
0
19 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and
  Natural Non-Autoregressive Text-to-Speech
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jaesung Bae
Jinhyeok Yang
Taejun Bak
Young-Sun Joo
DiffM
16
6
0
08 Apr 2022
Heterogeneous Target Speech Separation
Heterogeneous Target Speech Separation
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
46
26
0
07 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End
  Lightweight Text-to-Speech
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
14
0
0
05 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different
  Configurations for Speech Synthesis
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Fan Wang
Po-Chun Hsu
Da-Rong Liu
Hung-yi Lee
13
0
0
01 Apr 2022
WavThruVec: Latent speech representation as intermediate features for
  neural speech synthesis
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak
Piotr Dura
Pol van Rijn
Nori Jacoby
AI4TS
10
30
0
31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to
  Speech
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
D. Lim
Sunghee Jung
Eesung Kim
14
51
0
31 Mar 2022
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker
  SVS by Learning from Singing Teacher
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Heyang Xue
Xinsheng Wang
Yongmao Zhang
Lei Xie
Pengcheng Zhu
Mengxiao Bi
DiffM
22
11
0
30 Mar 2022
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise
  Distillation
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation
Rendi Chevi
Radityo Eko Prasojo
Alham Fikri Aji
Andros Tjandra
S. Sakti
VLM
6
3
0
29 Mar 2022
ASR data augmentation in low-resource settings using cross-lingual
  multi-speaker TTS and cross-lingual voice conversion
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Edresson Casanova
C. Shulby
Alexander Korolev
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
21
11
0
29 Mar 2022
Transfer Learning Framework for Low-Resource Text-to-Speech using a
  Large-Scale Unlabeled Speech Corpus
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Sunghwan Ahn
Joun Yeop Lee
N. Kim
36
25
0
29 Mar 2022
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial
  Fine-Tuning Results for Child Speech Synthesis
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Rishabh Jain
Mariam Yiwere
Dan Bigioi
Peter Corcoran
H. Cucu
17
14
0
22 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable
  Duration Modeling
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
14
2
0
21 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising
  flows
Text-free non-parallel many-to-many voice conversion using normalising flows
Thomas Merritt
Abdelhamid Ezzerg
Piotr Bilinski
Magdalena Proszewska
Kamil Pokora
Roberto Barra-Chicote
Daniel Korzekwa
28
14
0
15 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with
  Articulatory Features
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
20
29
0
07 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning
Variational Auto-Encoder based Mandarin Speech Cloning
Qingyu Xing
Xiaohan Ma
13
0
0
06 Mar 2022
Generative Modeling for Low Dimensional Speech Attributes with Neural
  Spline Flows
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Kevin J. Shih
Rafael Valle
Rohan Badlani
J. F. Santos
Bryan Catanzaro
25
4
0
03 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based
  Non-Autoregressive TTS
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Haohan Guo
Hui Lu
Xixin Wu
H. Meng
94
7
0
02 Mar 2022
Revisiting Over-Smoothness in Text to Speech
Revisiting Over-Smoothness in Text to Speech
Yi Ren
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
65
61
0
26 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in
  Text-to-Speech
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech
Yi Ren
Ming Lei
Zhiying Huang
Shi-Rui Zhang
Qian Chen
Zhijie Yan
Zhou Zhao
32
41
0
16 Feb 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising
  Diffusion GANs
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Dan Su
Dong Yu
DiffM
68
65
0
28 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
179
378
0
04 Dec 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance
Heeseung Kim
Sungwon Kim
Sungroh Yoon
DiffM
BDL
19
107
0
23 Nov 2021
Personalized One-Shot Lipreading for an ALS Patient
Personalized One-Shot Lipreading for an ALS Patient
Bipasha Sen
Aditya Agarwal
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
LM&MA
6
3
0
02 Nov 2021
VISinger: Variational Inference with Adversarial Learning for End-to-End
  Singing Voice Synthesis
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Yongmao Zhang
Jian Cong
Heyang Xue
Lei Xie
Pengcheng Zhu
Mengxiao Bi
19
73
0
17 Oct 2021
Intelligent Video Editing: Incorporating Modern Talking Face Generation
  Algorithms in a Video Editor
Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor
Anchit Gupta
Faizan Farooq Khan
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
CVBM
22
6
0
16 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice
  Generation
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
N. Yuan
GAN
99
62
0
14 Oct 2021
FedSpeech: Federated Text-to-Speech with Continual Learning
FedSpeech: Federated Text-to-Speech with Continual Learning
Ziyue Jiang
Yi Ren
Ming Lei
Zhou Zhao
FedML
93
24
0
14 Oct 2021
Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual
  Text-to-Speech
Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech
Haoyue Zhan
Xinyuan Yu
Haitong Zhang
Yang Zhang
Yue Lin
16
5
0
14 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative
  Sequence Models
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
EdiTTS: Score-based Editing for Controllable Text-to-Speech
EdiTTS: Score-based Editing for Controllable Text-to-Speech
Jaesung Tae
Hyeongju Kim
Taesu Kim
DiffM
173
39
0
06 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
42
78
0
30 Sep 2021
AligNART: Non-autoregressive Neural Machine Translation by Jointly
  Learning to Estimate Alignment and Translate
AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate
Jongyoon Song
Sungwon Kim
Sungroh Yoon
66
37
0
14 Sep 2021
Previous
123456
Next