ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.03199
  4. Cited By
Matcha-TTS: A fast TTS architecture with conditional flow matching
v1v2 (latest)

Matcha-TTS: A fast TTS architecture with conditional flow matching

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
6 September 2023
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
ArXiv (abs)PDFHTMLHuggingFace (12 upvotes)

Papers citing "Matcha-TTS: A fast TTS architecture with conditional flow matching"

50 / 97 papers shown
M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis
M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis
Xiaopeng Wang
Chunyu Qiang
Ruibo Fu
Zhengqi Wen
Xuefei Liu
...
Yuankun Xie
Heng Xie
Chenxing Li
Chen Zhang
Changsheng Li
DiffM
150
2
0
04 Dec 2025
Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale
Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale
Yicheng Zhong
Peiji Yang
Zhisheng Wang
124
0
0
26 Nov 2025
FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation
FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation
Kaixing Yang
Xulong Tang
Ziqiao Peng
X. Zhang
Puwei Wang
Jun He
Hongyan Liu
189
1
0
26 Nov 2025
oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention
oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention
Ryusuke Mizutani
Kazuaki Matano
Tsugumi Kadowaki
Haruki Tenya
Layris
nuigurumi
Koki Hashimoto
Yu Tanaka
165
0
0
11 Nov 2025
SyMuPe: Affective and Controllable Symbolic Music Performance
SyMuPe: Affective and Controllable Symbolic Music Performance
Ilya Borovik
Dmitrii Gavrilev
Vladimir Viro
104
0
0
05 Nov 2025
Step-Audio-EditX Technical Report
Step-Audio-EditX Technical Report
Chao Yan
Boyong Wu
Peng Yang
Pengfei Tan
Guoqiang Hu
...
Xiangyu Zhang
Daxin Jiang
Daxin Jiang
Shuchang Zhou
Gang Yu
140
2
0
05 Nov 2025
Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
Xinlu He
Swayambhu Nath Ray
Harish Mallidi
Jia-Hong Huang
Ashwin Bellur
Chander Chandak
M. Maruf
Venkatesh Ravichandran
160
0
0
14 Oct 2025
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
Hanke Xie
Dake Guo
C. Wang
Yue Li
WenJie Tian
...
Xinsheng Wang
Xiulin Li
Guanqiong Miao
B. Liu
Lei Xie
AuLLM
415
0
0
09 Oct 2025
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Wenhao Guan
Zhikang Niu
Ziyue Jiang
Kaidi Wang
Peijie Chen
Q. Hong
Lin Li
Xie Chen
AuLLM
321
0
0
06 Oct 2025
Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI
Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI
Jahidul Arafat
Fariha Tasmin
Sanjaya Poudel
Ahsan Habib Tareq
FedML
218
0
0
05 Oct 2025
Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
Hieu-Nghia Huynh-Nguyen
Huynh Nguyen Dang
Ngoc Son Nguyen
Van Nguyen
109
0
0
03 Oct 2025
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
147
0
0
26 Sep 2025
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
Ziqi Chen
Gongyu Chen
Yihua Wang
Chaofan Ding
Zihao Chen
Wei-Qiang Zhang
114
0
0
25 Sep 2025
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Tianqiao Liu
Xueyi Li
Hao Wang
Haoxuan Li
Zhichao Chen
Weiqi Luo
Zitao Liu
AuLLM
140
0
0
24 Sep 2025
TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
Yutong Liu
Ziyue Zhang
Ban Ma-bao
Renzeng Duojie
Yuqing Cai
Yongbin Yu
Xiangxiang Wang
Fan Gao
Cheng Huang
Nyima Tashi
158
1
0
22 Sep 2025
Discrete-Time Diffusion-Like Models for Speech Synthesis
Discrete-Time Diffusion-Like Models for Speech Synthesis
Xiaozhou Tan
Minghui Zhao
Mattias Cross
DiffM
162
0
0
22 Sep 2025
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
Nikita Torgashov
Gustav Eje Henter
Gabriel Skantze
VLM
140
0
0
19 Sep 2025
The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion
The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion
Lester Phillip Violeta
Xueyao Zhang
Jiatong Shi
Yusuke Yasuda
Wen-Chin Huang
Zhizheng Wu
Tomoki Toda
131
2
0
19 Sep 2025
Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
Miseul Kim
Soo Jin Park
Kyungguen Byun
Hyeon-Kyeong Shin
Sunkuk Moon
Shuhua Zhang
Erik Visser
64
0
0
18 Sep 2025
DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
Ngoc Son Nguyen
Hieu-Nghia Huynh-Nguyen
Thanh V. T. Tran
Truong-Son Hy
Van Nguyen
160
0
0
11 Sep 2025
DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
Yanru Huo
Ziyue Jiang
Zuoli Tang
Q. Hong
Zhou Zhao
128
1
0
11 Sep 2025
MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
Zihan Pan
Sailor Hardik Bhupendra
Jinyang Wu
MoE
168
2
0
11 Sep 2025
Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
Siratish Sakpiboonchit
116
0
0
10 Sep 2025
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Hashim Ali
Surya Subramani
Lekha Bollinani
Nithin Sai Adupa
Sali El-Loh
Hafiz Malik
151
0
0
28 Aug 2025
Preference Trajectory Modeling via Flow Matching for Sequential Recommendation
Preference Trajectory Modeling via Flow Matching for Sequential Recommendation
Li Li
Mingyue Cheng
Yuyang Ye
Zhiding Liu
Tong Xu
DiffMAI4TS
143
1
0
25 Aug 2025
MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
Xuwen Yang
112
0
0
20 Aug 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou
Jiawei Zhou
Karen Livescu
231
4
0
12 Aug 2025
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
Jaskaran Singh
Amartya Roy Chowdhury
Raghav Prabhakar
Varshul C. W
94
0
0
05 Aug 2025
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
Chengqian Ma
Wei Tao
Yiwen Guo
AuLLM
229
4
0
30 Jul 2025
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
Qibing Bai
Sho Inoue
Shuai Wang
Zhongjie Jiang
Yannan Wang
Haizhou Li
138
1
0
23 Jul 2025
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Duc Cao-Dinh
Khai Le-Duc
Anh Dao
Bach Phan Tat
Chris Ngo
Duy M. H. Nguyen
Nguyen X. Khanh
Thanh Nguyen-Tang
225
0
0
01 Jul 2025
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
Paige Tuttosi
H. H. Yeung
Yue Wang
J. Aucouturier
Angelica Lim
VLM
131
0
0
29 Jun 2025
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
Hyun Joon Park
Jeongmin Liu
Jin Sob Kim
Jeong Yeol Yang
Sung Won Han
Eunwoo Song
185
1
0
20 Jun 2025
EmojiVoice: Towards long-term controllable expressivity in robot speech
EmojiVoice: Towards long-term controllable expressivity in robot speech
Paige Tuttosi
Shivam Mehta
Zachary Syvenky
Bermet Burkanova
G. Henter
Angelica Lim
235
1
0
18 Jun 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark DataIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
308
3
0
18 Jun 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
353
13
0
16 Jun 2025
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
Neta Glazer
Aviv Navon
Yael Segal
Aviv Shamsian
Hilit Segev
Asaf Buchnick
Menachem Pirchi
Gil Hetz
Joseph Keshet
267
2
0
11 Jun 2025
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
Cheng-Kang Chou
Chan-Jan Hsu
Ho-Lam Chung
Liang-Hsuan Tseng
H. Cheng
Yu-Kuan Fu
Kuan Po Huang
Hung-yi Lee
394
1
0
10 Jun 2025
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama
Masaya Kawamura
Ryo Terashima
Ryuichi Yamamoto
Tomoki Toda
261
0
0
04 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
221
0
0
01 Jun 2025
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
Leying Zhang
Y. Qian
Xiaofei Wang
Manthan Thakker
Dongmei Wang
...
Haibin Wu
Yuxuan Hu
Jinyu Li
Yanmin Qian
Sheng Zhao
255
5
0
01 Jun 2025
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
Junqi Zhao
Jinzheng Zhao
Haohe Liu
Yun Chen
Lu Han
Xubo Liu
Mark D. Plumbley
Wenwu Wang
DiffM
235
2
0
28 May 2025
BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
Susan Liang
Dejan Marković
I. D. Gebru
Steven Krenn
Todd Keebler
Jacob Sandakly
Frank Yu
Samuel Hassel
Chenliang Xu
Alexander Richard
226
5
0
28 May 2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Jeongsoo Choi
Jaehun Kim
Joon Son Chung
219
0
0
27 May 2025
CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning
CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning
Renyuan Li
Zhibo Liang
Haichuan Zhang
Tianyu Shi
Zhiyuan Cheng
Jia Shi
Carl Yang
Mingjie Tang
AAML
316
2
0
25 May 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
335
59
0
23 May 2025
Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection
Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection
Taewoo Kim
Guisik Kim
Choongsang Cho
Young Han Lee
208
1
0
20 May 2025
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
Yutong Liu
Ziyue Zhang
Ban Ma-bao
Yuqing Cai
Yongbin Yu
Renzeng Duojie
Xiangxiang Wang
Fan Gao
Cheng Huang
Nyima Tashi
280
3
0
20 May 2025
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Hieu-Nghia Huynh-Nguyen
Ngoc Son Nguyen
Huynh Nguyen Dang
Thieu Vo
Truong-Son Hy
Van Nguyen
315
4
0
19 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
Haozhe Zhang
...
Yichen Xiao
Yiying Zhou
Yujiao Shi
Yuan Lu
Yucen He
281
24
0
12 May 2025
12
Next