ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.11129
  4. Cited By
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

22 May 2020
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
ArXivPDFHTML

Papers citing "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search"

50 / 286 papers shown
Title
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
34
0
0
12 May 2025
Language translation, and change of accent for speech-to-speech task using diffusion model
Language translation, and change of accent for speech-to-speech task using diffusion model
Abhishek Mishra
Ritesh Sur Chowdhury
Vartul Bahuguna
Isha Pandey
Ganesh Ramakrishnan
DiffM
44
0
0
04 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
A. Hengel
Yuankai Qi
Qingming Huang
121
0
0
02 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
39
0
0
01 May 2025
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Haowei Lou
Hye-Young Paik
Sheng Li
Wen Hu
Lina Yao
32
0
0
11 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Z. Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
51
0
0
07 Apr 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
H. Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
49
0
0
29 Mar 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
A. Hengel
Yuankai Qi
83
2
0
15 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Juncheng Wang
Chao Xu
Cheng Yu
Lei Shang
Zhe Hu
Shujun Wang
Liefeng Bo
DiffM
VGen
43
0
0
10 Mar 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Y. Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
64
0
0
26 Feb 2025
Everyday Speech in the Indian Subcontinent
Everyday Speech in the Indian Subcontinent
Utkarsh Pathak
54
1
0
24 Feb 2025
Less is More for Synthetic Speech Detection in the Wild
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
A. Hengel
Jian Yang
Qingming Huang
90
6
0
12 Dec 2024
SKQVC: One-Shot Voice Conversion by K-Means Quantization with
  Self-Supervised Speech Representations
SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Youngjun Sim
Jinsung Yoon
Young-Joo Suh
79
0
0
25 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
39
3
0
04 Nov 2024
Mitigating Unauthorized Speech Synthesis for Voice Protection
Mitigating Unauthorized Speech Synthesis for Voice Protection
Zhisheng Zhang
Qianyi Yang
Derui Wang
Pengyang Huang
Yuxin Cao
Kai Ye
Jie Hao
AAML
11
3
0
28 Oct 2024
Making Social Platforms Accessible: Emotion-Aware Speech Generation with
  Integrated Text Analysis
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Suparna De
Ionut Bostan
Nishanth Sastry
32
0
0
24 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Guanrou Yang
Fan Yu
Z. Ma
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
27
1
0
22 Oct 2024
DurIAN-E 2: Duration Informed Attention Network with Adaptive
  Variational Autoencoder and Adversarial Learning for Expressive
  Text-to-Speech Synthesis
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
Yu Gu
Qiushi Zhu
Guangzhi Lei
Chao Weng
Dan Su
DiffM
37
0
0
17 Oct 2024
Sound Check: Auditing Audio Datasets
Sound Check: Auditing Audio Datasets
William Agnew
Julia Barnett
Annie Chu
Rachel Hong
Michael Feffer
Robin Netzorg
Harry H. Jiang
Ezra Awumey
Sauvik Das
41
1
0
17 Oct 2024
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via
  Lightweight Value Optimization
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization
Xingqi Wang
Xiaoyuan Yi
Xing Xie
Jia Jia
19
1
0
16 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow
  Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
25
52
0
09 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Anton Firc
K. Malinka
P. Hanáček
DiffM
31
0
0
09 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech
  Synthesis with Discrete Codec Modeling of EnGen-TTS
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
21
0
0
09 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence
  Alignment in Neural Text-to-Speech
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
18
0
0
07 Oct 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models
SSR: Alignment-Aware Modality Connector for Speech Language Models
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
32
3
0
30 Sep 2024
Facial Expression-Enhanced TTS: Combining Face Representation and
  Emotion Intensity for Adaptive Speech
Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech
Yunji Chu
Yunseob Shim
Unsang Park
26
0
0
24 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
46
3
0
23 Sep 2024
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
Daewoong Kim
Hao-Wen Dong
Dasaem Jeong
18
0
0
19 Sep 2024
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition
  on Low-Resource Accented Speech Corpora
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
F. Nespoli
Daniel Barreda
Patrick A. Naylor
28
1
0
17 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
35
2
0
13 Sep 2024
Super Monotonic Alignment Search
Super Monotonic Alignment Search
Junhyeok Lee
Hyeongju Kim
29
0
0
12 Sep 2024
ManaTTS Persian: a recipe for creating TTS datasets for lower resource
  languages
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
Mahta Fetrat Qharabagh
Zahra Dehghanian
Hamid R. Rabiee
16
1
0
11 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
34
38
0
01 Sep 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
32
1
0
29 Aug 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Octavian Pascu
Dan Oneaţă
H. Cucu
Nicolas M. Muller
40
1
0
28 Aug 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
29
1
0
20 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks
  at Scale
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
45
38
0
16 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OOD
DiffM
AI4TS
43
5
0
14 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for
  Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
36
0
0
11 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
35
2
0
08 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
33
0
0
06 Aug 2024
On the Problem of Text-To-Speech Model Selection for Synthetic Data
  Generation in Automatic Speech Recognition
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach
Ralf Schluter
S. Sakti
22
2
0
31 Jul 2024
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery
  Detection and Localization
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
Junyan Wu
Wei Lu
Xiangyang Luo
Rui Yang
Qian Wang
Xiaochun Cao
34
3
0
23 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
29
11
0
08 Jul 2024
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic
  Features
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Tomoki Koriyama
29
0
0
03 Jul 2024
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech
  Synthesis
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Yinlin Guo
Yening Lv
Jinqiao Dou
Yan Zhang
Yuehai Wang
18
0
0
30 Jun 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling
  on Time Variability
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Hyun Joon Park
Jin Sob Kim
Wooseok Shin
Sung Won Han
DiffM
33
2
0
27 Jun 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
35
46
0
26 Jun 2024
123456
Next