v1v2 (latest)

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

22 May 2020

Papers citing "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search"

50 / 316 papers shown

FabasedVC: Enhancing Voice Conversion with Text Modality Fusion and Phoneme-Level SSL Features

13 Nov 2025

ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation

158

21 Oct 2025

Randomness from causally independent processes

219

06 Oct 2025

HuLA: Prosody-Aware Anti-Spoofing with Multi-Task Learning for Expressive and Emotional Synthetic Speech

Aurosweta Mahapatra

Ismail Rasim Ulgen

Berrak Sisman

315

25 Sep 2025

Eliminating stability hallucinations in llm-based tts models via attention guidance

234

24 Sep 2025

SEA-Spoof: Bridging The Gap in Multilingual Audio Deepfake Detection for South-East Asian

Sailor Hardik Bhupendra

Soumik Mondal

202

24 Sep 2025

Discrete-Time Diffusion-Like Models for Speech Synthesis

267

22 Sep 2025

Real-Time Streaming Mel Vocoding with Generative Flow Matching

Simon Welker

Tal Peer

Timo Gerkmann

135

18 Sep 2025

Length-Aware Rotary Position Embedding for Text-Speech Alignment

132

14 Sep 2025

Whisper Has an Internal Word Aligner

Sung-Lin Yeh

Yen Meng

Hao Tang

182

12 Sep 2025

MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection

Zihan Pan

Sailor Hardik Bhupendra

Jinyang Wu

MoE

252

11 Sep 2025

Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems

275

09 Sep 2025

AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

213

04 Sep 2025

FreeTalk:A plug-and-play and black-box defense against speech synthesis attacks

143

30 Aug 2025

Analysis of Domain Shift across ASR Architectures via TTS-Enabled Separation of Target Domain and Acoustic Conditions

Tina Raissi

Nick Rossenbach

Ralf Schluter

158

13 Aug 2025

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation MethodsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

250

29 Jul 2025

Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition

316

18 Jul 2025

Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes

324

17 Jul 2025

You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties

189

29 Jun 2025

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

283

23 Jun 2025

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching

246

20 Jun 2025

TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark DataIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

456

18 Jun 2025

A Variational Framework for Improving Naturalness in Generative Spoken Language Models

Li-Wei Chen

Takuya Higuchi

Zakaria Aldeneh

Ahmed Hussen Abdelaziz

Alexander I. Rudnicky

263

17 Jun 2025

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

481

16 Jun 2025

Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation

328

10 Jun 2025

Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages

Utkarsh Pathak

Chandra Sai Krishna Gunda

Anusha Prakash

Keshav Agarwal

Hema A. Murthy

268

04 Jun 2025

Synthetic Speech Source Tracing using Metric Learning

Dimitrios Koutsianos

Stavros Zacharopoulos

Yannis Panagakis

Themos Stafylakis

183

03 Jun 2025

XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark

Ioan-Paul Ciobanu

Andrei Iulian Hiji

Nicolae-Cătălin Ristea

Paul Irofti

Cristian Rusu

Radu Tudor Ionescu

238

31 May 2025

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

455

26 May 2025

STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution

Anton Firc

Manasi Chibber

Jagabandhu Mishra

Vishwanath Pratap Singh

Tomi Kinnunen

K. Malinka

587

26 May 2025

OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Hieu-Nghia Huynh-Nguyen

409

19 May 2025

Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applicationsSpeech Synthesis Workshop (SSW), 2023

433

12 May 2025

Language translation, and change of accent for speech-to-speech task using diffusion model

250

04 May 2025

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing

1.0K

02 May 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni

Abdulmotaleb El Saddik

VLM

450

01 May 2025

Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation

279

11 Apr 2025

P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation

439

07 Apr 2025

SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System

545

29 Mar 2025

Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie DubbingComputer Vision and Pattern Recognition (CVPR), 2025

392

15 Mar 2025

An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR

Sewade Ogun

Vincent Colotte

Emmanuel Vincent

384

11 Mar 2025

Synchronized Video-to-Audio Generation via Mel Quantization-Continuum DecompositionComputer Vision and Pattern Recognition (CVPR), 2025

305

10 Mar 2025

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

...

669

26 Feb 2025

Everyday Speech in the Indian Subcontinent

Utkarsh Pathak

302

24 Feb 2025

VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis

183

26 Dec 2024

EmoDubber: Towards High Quality and Emotion Controllable Movie DubbingComputer Vision and Pattern Recognition (CVPR), 2024

688

12 Dec 2024

QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice Conversion

Youngjun Sim

Jinsung Yoon

Young-Joo Suh

373

25 Nov 2024

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical VectorIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024

520

04 Nov 2024

Mitigating Unauthorized Speech Synthesis for Voice Protection

204

28 Oct 2024

Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text AnalysisInternational Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2024

Suparna De

Ionut Bostan

Nishanth Sastry

301

24 Oct 2024

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data GapIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

299

22 Oct 2024