Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2005.11129
Cited By
v1
v2 (latest)
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
22 May 2020
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search"
50 / 316 papers shown
FabasedVC: Enhancing Voice Conversion with Text Modality Fusion and Phoneme-Level SSL Features
Wenyu Wang
Zhetao Hu
Yiquan Zhou
Jiacheng Xu
Z. F. Wu
Chen Li
Shihao Li
97
0
0
13 Nov 2025
ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation
Haowei Lou
Hye-Young Paik
Wen Hu
Lina Yao
158
2
0
21 Oct 2025
Randomness from causally independent processes
Martin Sandfuchs
Carla Ferradini
R. Renner
CML
219
0
0
06 Oct 2025
HuLA: Prosody-Aware Anti-Spoofing with Multi-Task Learning for Expressive and Emotional Synthetic Speech
Aurosweta Mahapatra
Ismail Rasim Ulgen
Berrak Sisman
315
0
0
25 Sep 2025
Eliminating stability hallucinations in llm-based tts models via attention guidance
ShiMing Wang
Zhihao Du
Yang Xiang
Tianyu Zhao
Han Zhao
Xinyuan Wei
Xiangang Li
HanJie Guo
Zhenhua Ling
234
0
0
24 Sep 2025
SEA-Spoof: Bridging The Gap in Multilingual Audio Deepfake Detection for South-East Asian
Jinyang Wu
Nana Hou
Zihan Pan
Qiquan Zhang
Sailor Hardik Bhupendra
Soumik Mondal
202
1
0
24 Sep 2025
Discrete-Time Diffusion-Like Models for Speech Synthesis
Xiaozhou Tan
Minghui Zhao
Mattias Cross
DiffM
267
0
0
22 Sep 2025
Real-Time Streaming Mel Vocoding with Generative Flow Matching
Simon Welker
Tal Peer
Timo Gerkmann
135
1
0
18 Sep 2025
Length-Aware Rotary Position Embedding for Text-Speech Alignment
Hyeongju Kim
Juheon Lee
Jinhyeok Yang
Jacob Morton
AuLLM
132
1
0
14 Sep 2025
Whisper Has an Internal Word Aligner
Sung-Lin Yeh
Yen Meng
Hao Tang
182
1
0
12 Sep 2025
MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
Zihan Pan
Sailor Hardik Bhupendra
Jinyang Wu
MoE
252
4
0
11 Sep 2025
Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems
Kamel Kamel
Hridoy Sankar Dutta
Keshav Sood
Sunil Aryal
AAML
275
1
0
09 Sep 2025
AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds
Qizhou Wang
Hanxun Huang
Guansong Pang
S. Erfani
Christopher Leckie
213
0
0
04 Sep 2025
FreeTalk:A plug-and-play and black-box defense against speech synthesis attacks
Yuwen Pu
Zhou Feng
Chunyi Zhou
Jiahao Chen
Chunqiang Hu
Haibo Hu
S. Ji
AAML
143
0
0
30 Aug 2025
Analysis of Domain Shift across ASR Architectures via TTS-Enabled Separation of Target Domain and Acoustic Conditions
Tina Raissi
Nick Rossenbach
Ralf Schluter
158
1
0
13 Aug 2025
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Wen Huang
Yanmei Gu
Zhiming Wang
Huijia Zhu
Yanmin Qian
250
10
0
29 Jul 2025
Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition
Cheng-Hung Hu
Yusuke Yasuda
Akifumi Yoshimoto
Tomoki Toda
316
0
0
18 Jul 2025
Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes
Zhou Feng
Jiahao Chen
Chunyi Zhou
Yuwen Pu
Qingming Li
Xuhong Zhang
S. Ji
AAML
324
6
0
17 Jul 2025
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
Paige Tuttosi
H. H. Yeung
Yue Wang
J. Aucouturier
Angelica Lim
VLM
189
0
0
29 Jun 2025
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Siyi Zhou
Yiquan Zhou
Yi He
Xun Zhou
Jinchao Wang
Wei Deng
Jingchen Shu
DiffM
283
56
0
23 Jun 2025
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
Hyun Joon Park
Jeongmin Liu
Jin Sob Kim
Jeong Yeol Yang
Sung Won Han
Eunwoo Song
246
1
0
20 Jun 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
456
3
0
18 Jun 2025
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen
Takuya Higuchi
Zakaria Aldeneh
Ahmed Hussen Abdelaziz
Alexander I. Rudnicky
263
2
0
17 Jun 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
481
21
0
16 Jun 2025
Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffM
MedIm
328
3
0
10 Jun 2025
Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages
Utkarsh Pathak
Chandra Sai Krishna Gunda
Anusha Prakash
Keshav Agarwal
Hema A. Murthy
268
0
0
04 Jun 2025
Synthetic Speech Source Tracing using Metric Learning
Dimitrios Koutsianos
Stavros Zacharopoulos
Yannis Panagakis
Themos Stafylakis
183
5
0
03 Jun 2025
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
Ioan-Paul Ciobanu
Andrei Iulian Hiji
Nicolae-Cătălin Ristea
Paul Irofti
Cristian Rusu
Radu Tudor Ionescu
238
2
0
31 May 2025
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
Pooneh Mousavi
Yingzhi Wang
Mirco Ravanelli
Cem Subakan
AuLLM
455
3
0
26 May 2025
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
Anton Firc
Manasi Chibber
Jagabandhu Mishra
Vishwanath Pratap Singh
Tomi Kinnunen
K. Malinka
587
0
0
26 May 2025
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hieu-Nghia Huynh-Nguyen
Ngoc Son Nguyen
Huynh Nguyen Dang
Thieu Vo
Truong-Son Hy
Van Nguyen
409
7
0
19 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Speech Synthesis Workshop (SSW), 2023
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
433
7
0
12 May 2025
Language translation, and change of accent for speech-to-speech task using diffusion model
Abhishek Mishra
Ritesh Sur Chowdhury
Vartul Bahuguna
Isha Pandey
Ganesh Ramakrishnan
DiffM
250
0
0
04 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
Anton Van Den Hengel
Yuankai Qi
Qingming Huang
1.0K
7
0
02 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
450
6
0
01 May 2025
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Haowei Lou
Hye-Young Paik
Sheng Li
Wen Hu
Lina Yao
279
3
0
11 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
Jianhua Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
439
0
0
07 Apr 2025
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
Hyeongju Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
545
1
0
29 Mar 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Computer Vision and Pattern Recognition (CVPR), 2025
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
Anton Van Den Hengel
Yuankai Qi
392
8
0
15 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
384
2
0
11 Mar 2025
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Computer Vision and Pattern Recognition (CVPR), 2025
Juncheng Wang
Chao Xu
Cheng Yu
Lei Shang
Zhe Hu
Shujun Wang
Liefeng Bo
DiffM
VGen
305
6
0
10 Mar 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Yanzhe Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
669
0
0
26 Feb 2025
Everyday Speech in the Indian Subcontinent
Utkarsh Pathak
302
1
0
24 Feb 2025
VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
Jaemin Jung
Junseok Ahn
Chaeyoung Jung
Tan Dat Nguyen
Youngjoon Jang
Joon Son Chung
DiffM
183
10
0
26 Dec 2024
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Computer Vision and Pattern Recognition (CVPR), 2024
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
688
23
0
12 Dec 2024
QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice Conversion
Youngjun Sim
Jinsung Yoon
Young-Joo Suh
Young-Joo Suh
373
3
0
25 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
520
28
0
04 Nov 2024
Mitigating Unauthorized Speech Synthesis for Voice Protection
Zhisheng Zhang
Qianyi Yang
Derui Wang
Pengyang Huang
Yuxin Cao
Kai Ye
Jie Hao
AAML
204
14
0
28 Oct 2024
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2024
Suparna De
Ionut Bostan
Nishanth Sastry
301
1
0
24 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Guanrou Yang
Fan Yu
Tianhao Shen
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
299
14
0
22 Oct 2024
1
2
3
4
5
6
7
Next
Page 1 of 7