v1v2 (latest)

Matcha-TTS: A fast TTS architecture with conditional flow matching

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

6 September 2023

ArXiv (abs)PDF HTML HuggingFace (12 upvotes)

Papers citing "Matcha-TTS: A fast TTS architecture with conditional flow matching"

50 / 97 papers shown

M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis

...

150

04 Dec 2025

Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale

Yicheng Zhong

Peiji Yang

Zhisheng Wang

124

26 Nov 2025

FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation

189

26 Nov 2025

oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention

165

11 Nov 2025

SyMuPe: Affective and Controllable Symbolic Music Performance

Ilya Borovik

Dmitrii Gavrilev

Vladimir Viro

104

05 Nov 2025

Step-Audio-EditX Technical Report

...

140

05 Nov 2025

Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs

Venkatesh Ravichandran

160

14 Oct 2025

DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching

...

415

09 Oct 2025

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

321

06 Oct 2025

Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI

218

05 Oct 2025

Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech

Hieu-Nghia Huynh-Nguyen

Huynh Nguyen Dang

Ngoc Son Nguyen

Van Nguyen

109

03 Oct 2025

High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling

147

26 Sep 2025

DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation

114

25 Sep 2025

From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training

140

24 Sep 2025

TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation

158

22 Sep 2025

Discrete-Time Diffusion-Like Models for Speech Synthesis

162

22 Sep 2025

VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency

140

19 Sep 2025

The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion

Lester Phillip Violeta

131

19 Sep 2025

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation

18 Sep 2025

DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching

Ngoc Son Nguyen

Hieu-Nghia Huynh-Nguyen

Thanh V. T. Tran

Truong-Son Hy

Van Nguyen

160

11 Sep 2025

DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration

128

11 Sep 2025

MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection

Zihan Pan

Sailor Hardik Bhupendra

Jinyang Wu

MoE

168

11 Sep 2025

Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching

Siratish Sakpiboonchit

116

10 Sep 2025

Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System

151

28 Aug 2025

Preference Trajectory Modeling via Flow Matching for Sequential Recommendation

143

25 Aug 2025

MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr

Xuwen Yang

112

20 Aug 2025

Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling

Ju-Chieh Chou

Jiawei Zhou

Karen Livescu

231

12 Aug 2025

MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis

Jaskaran Singh

Amartya Roy Chowdhury

Raghav Prabhakar

Varshul C. W

05 Aug 2025

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

229

30 Jul 2025

Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data

138

23 Jul 2025

Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding

225

01 Jul 2025

You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties

131

29 Jun 2025

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching

185

20 Jun 2025

EmojiVoice: Towards long-term controllable expressivity in robot speech

235

18 Jun 2025

TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark DataIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

308

18 Jun 2025

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

353

16 Jun 2025

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching

267

11 Jun 2025

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

394

10 Jun 2025

Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments

261

04 Jun 2025

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

221

01 Jun 2025

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching

...

255

01 Jun 2025

AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion

235

28 May 2025

BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models

226

28 May 2025

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

Jeongsoo Choi

Jaehun Kim

Joon Son Chung

219

27 May 2025

CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning

316

25 May 2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

...

335

23 May 2025

Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection

208

20 May 2025

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation

280

20 May 2025

OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Hieu-Nghia Huynh-Nguyen

315

19 May 2025

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

...

281

12 May 2025