ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.11129
  4. Cited By
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

22 May 2020
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
ArXivPDFHTML

Papers citing "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search"

50 / 286 papers shown
Title
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
38
0
0
24 Jun 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient
  Zero-Shot Text to Speech Synthesizers
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Guanrou Yang
Xie Chen
AuLLM
38
3
0
22 Jun 2024
Period Singer: Integrating Periodic and Aperiodic Variational
  Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
Taewoo Kim
Choongsang Cho
Young Han Lee
AI4TS
33
0
0
14 Jun 2024
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with
  Paralanguage
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
Kyra Wang
Dorien Herremans
24
0
0
13 Jun 2024
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
Jihwan Lee
Aditya Kommineni
Tiantian Feng
Kleanthis Avramidis
Xuan Shi
Sudarsana Reddy Kadiri
Shrikanth Narayanan
31
0
0
12 Jun 2024
Training Data Augmentation for Dysarthric Automatic Speech Recognition
  by Text-to-Dysarthric-Speech Synthesis
Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
Wing-Zin Leung
Mattias Cross
Anton Ragni
Stefan Goetze
22
4
0
12 Jun 2024
Asynchronous Voice Anonymization Using Adversarial Perturbation On
  Speaker Embedding
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
Rui Wang
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
23
2
0
12 Jun 2024
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual
  Text-to-Speech
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar
Nirmesh Shah
Sai Akarsh
Pankaj Wasnik
R. Shah
32
1
0
12 Jun 2024
Should you use a probabilistic duration model in TTS? Probably!
  Especially for spontaneous speech
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
Shivam Mehta
Harm Lameris
Rajiv Punmiya
Jonas Beskow
Éva Székely
G. Henter
23
1
0
08 Jun 2024
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Kun Zhou
Shengkui Zhao
Yukun Ma
Chong Zhang
Hao Wang
Dianwen Ng
Chongjia Ni
Nguyen Trung Hieu
J. Yip
Bin Ma
33
5
0
04 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and
  Zero-shot Language Style Control With Decoupled Codec
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Siqi Zheng
Qian Chen
...
Ziyue Jiang
Hai Huang
Xize Cheng
Rongjie Huang
Zhou Zhao
52
8
0
03 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
52
0
0
31 May 2024
Exploring speech style spaces with language models: Emotional TTS
  without emotion labels
Exploring speech style spaces with language models: Emotional TTS without emotion labels
Shreeram Suresh Chandra
Zongyang Du
Berrak Sisman
38
2
0
18 May 2024
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang
Ji-Hoon Kim
Junseok Ahn
Doyeop Kwak
Hong-Sun Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
CVBM
29
9
0
16 May 2024
Building a Luganda Text-to-Speech Model From Crowdsourced Data
Building a Luganda Text-to-Speech Model From Crowdsourced Data
Sulaiman Kagumire
Andrew Katumba
J. Nakatumba‐Nabende
John Quinn
16
1
0
16 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
47
15
0
08 May 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
32
1
0
30 Apr 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
46
4
0
30 Apr 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
36
10
0
28 Apr 2024
An RFP dataset for Real, Fake, and Partially fake audio detection
An RFP dataset for Real, Fake, and Partially fake audio detection
Abdulazeez Alali
George Theodorakopoulos
30
2
0
26 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
34
1
0
10 Apr 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using
  Hypernetworks
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Yingting Li
Rishabh Bhardwaj
Ambuj Mehrish
Bo Cheng
Soujanya Poria
38
2
0
06 Apr 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through
  Weighted Samplers and Consistency Models
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
32
6
0
31 Mar 2024
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot
  Text-to-Speech with Model and Data Scaling
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Chunhui Wang
Chang Zeng
Bowen Zhang
Ziyang Ma
Yefan Zhu
Zifeng Cai
Jian Zhao
Zhonglin Jiang
Yong Chen
SyDa
44
5
0
09 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
43
143
0
05 Mar 2024
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding
  Decomposition
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition
Rendi Chevi
Alham Fikri Aji
25
2
0
22 Feb 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot
  Text-to-Speech
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Shengpeng Ji
Ziyue Jiang
Hanting Wang
Jia-li Zuo
Zhou Zhao
32
9
0
14 Feb 2024
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Naoyuki Kanda
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
...
Yufei Xia
Jinzhu Li
Yanqing Liu
Sheng Zhao
Michael Zeng
27
8
0
12 Feb 2024
Diff-RNTraj: A Structure-aware Diffusion Model for Road
  Network-constrained Trajectory Generation
Diff-RNTraj: A Structure-aware Diffusion Model for Road Network-constrained Trajectory Generation
Tonglong Wei
Youfang Lin
S. Guo
Yan Lin
Yiheng Huang
Chenyang Xiang
Yuqing Bai
Menglu Ya
Huaiyu Wan
31
11
0
12 Feb 2024
Giving Robots a Voice: Human-in-the-Loop Voice Creation and open-ended
  Labeling
Giving Robots a Voice: Human-in-the-Loop Voice Creation and open-ended Labeling
Pol van Rijn
Silvan Mertes
Kathrin Janowski
Katharina Weitz
Nori Jacoby
Elisabeth André
13
5
0
07 Feb 2024
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Tan Dat Nguyen
Ji-Hoon Kim
Youngjoon Jang
Jaehun Kim
Joon Son Chung
DiffM
39
5
0
18 Jan 2024
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
Nicolas M. Muller
Piotr Kawa
Wei Herng Choong
Edresson Casanova
Eren Golge
Thorsten Muller
P. Syga
Philip Sperl
Konstantin Böttinger
37
35
0
17 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
39
1
0
16 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
19
35
0
14 Jan 2024
Incremental FastPitch: Chunk-based High Quality Text to Speech
Incremental FastPitch: Chunk-based High Quality Text to Speech
Muyang Du
Chuan Liu
Junjie Lai
18
0
0
03 Jan 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic
  Token Prediction
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Semin Kim
Joun Yeop Lee
Nam Soo Kim
AI4TS
23
4
0
03 Jan 2024
AE-Flow: AutoEncoder Normalizing Flow
AE-Flow: AutoEncoder Normalizing Flow
Jakub Mosiński
Piotr Bilinski
Thomas Merritt
Abdelhamid Ezzerg
Daniel Korzekwa
26
4
0
27 Dec 2023
Creating New Voices using Normalizing Flows
Creating New Voices using Normalizing Flows
Piotr Bilinski
Thomas Merritt
Abdelhamid Ezzerg
Kamil Pokora
Sebastian Cygert
K. Yanagisawa
Roberto Barra-Chicote
Daniel Korzekwa
18
17
0
22 Dec 2023
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive
  Text-to-Speech Synthesis
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
23
8
0
17 Dec 2023
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Zehua Chen
Guande He
Kaiwen Zheng
Xu Tan
Jun Zhu
DiffM
53
21
0
06 Dec 2023
OpenVoice: Versatile Instant Voice Cloning
OpenVoice: Versatile Instant Voice Cloning
Zengyi Qin
Wenliang Zhao
Xumin Yu
Xin Sun
VLM
27
19
0
03 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using
  Synthetic Data and Transfer learning
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
25
0
0
02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Raviraj Joshi
Nikesh Garera
25
0
0
02 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
27
31
0
21 Nov 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech
  Synthesis
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
23
1
0
20 Nov 2023
Utilizing Speech Emotion Recognition and Recommender Systems for
  Negative Emotion Handling in Therapy Chatbots
Utilizing Speech Emotion Recognition and Recommender Systems for Negative Emotion Handling in Therapy Chatbots
Farideh Majidi
Marzieh Bahrami
19
2
0
18 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
  Learning
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
20
0
0
07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic
  Token Prediction
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Dongjune Lee
N. Kim
AI4TS
25
10
0
06 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
26
27
0
02 Nov 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
19
6
0
26 Oct 2023
Previous
123456
Next