ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.03199
  4. Cited By
Matcha-TTS: A fast TTS architecture with conditional flow matching
v1v2 (latest)

Matcha-TTS: A fast TTS architecture with conditional flow matching

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
6 September 2023
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
ArXiv (abs)PDFHTMLHuggingFace (12 upvotes)

Papers citing "Matcha-TTS: A fast TTS architecture with conditional flow matching"

50 / 67 papers shown
Title
FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation
FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation
Kaixing Yang
Xulong Tang
Ziqiao Peng
X. Zhang
Puwei Wang
Jun He
Hongyan Liu
88
0
0
26 Nov 2025
Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale
Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale
Yicheng Zhong
Peiji Yang
Zhisheng Wang
97
0
0
26 Nov 2025
oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention
oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention
Ryusuke Mizutani
Kazuaki Matano
Tsugumi Kadowaki
Haruki Tenya
Layris
nuigurumi
Koki Hashimoto
Yu Tanaka
120
0
0
11 Nov 2025
SyMuPe: Affective and Controllable Symbolic Music Performance
SyMuPe: Affective and Controllable Symbolic Music Performance
Ilya Borovik
Dmitrii Gavrilev
Vladimir Viro
64
0
0
05 Nov 2025
Step-Audio-EditX Technical Report
Step-Audio-EditX Technical Report
Chao Yan
Boyong Wu
Peng Yang
Pengfei Tan
Guoqiang Hu
...
Xiangyu Zhang
Daxin Jiang
Daxin Jiang
Shuchang Zhou
Gang Yu
84
1
0
05 Nov 2025
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
Hanke Xie
Dake Guo
C. Wang
Yue Li
WenJie Tian
...
Xinsheng Wang
Xiulin Li
Guanqiong Miao
B. Liu
Lei Xie
AuLLM
297
0
0
09 Oct 2025
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Wenhao Guan
Zhikang Niu
Ziyue Jiang
Kaidi Wang
Peijie Chen
Q. Hong
Lin Li
Xie Chen
AuLLM
193
0
0
06 Oct 2025
Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI
Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI
Jahidul Arafat
Fariha Tasmin
Sanjaya Poudel
Ahsan Habib Tareq
FedML
157
0
0
05 Oct 2025
Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
Hieu-Nghia Huynh-Nguyen
Huynh Nguyen Dang
Ngoc Son Nguyen
Van Nguyen
68
0
0
03 Oct 2025
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
103
0
0
26 Sep 2025
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Tianqiao Liu
Xueyi Li
Hao Wang
Haoxuan Li
Zhichao Chen
Weiqi Luo
Zitao Liu
AuLLM
118
0
0
24 Sep 2025
TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
Yutong Liu
Ziyue Zhang
Ban Ma-bao
Renzeng Duojie
Yuqing Cai
Yongbin Yu
Xiangxiang Wang
Fan Gao
Cheng Huang
Nyima Tashi
88
1
0
22 Sep 2025
Discrete-Time Diffusion-Like Models for Speech Synthesis
Discrete-Time Diffusion-Like Models for Speech Synthesis
Xiaozhou Tan
Minghui Zhao
Mattias Cross
DiffM
88
0
0
22 Sep 2025
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
Nikita Torgashov
Gustav Eje Henter
Gabriel Skantze
VLM
104
0
0
19 Sep 2025
The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion
The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion
Lester Phillip Violeta
Xueyao Zhang
Jiatong Shi
Yusuke Yasuda
Wen-Chin Huang
Zhizheng Wu
Tomoki Toda
80
2
0
19 Sep 2025
Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
Miseul Kim
Soo Jin Park
Kyungguen Byun
Hyeon-Kyeong Shin
Sunkuk Moon
Shuhua Zhang
Erik Visser
60
0
0
18 Sep 2025
DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
Yanru Huo
Ziyue Jiang
Zuoli Tang
Q. Hong
Zhou Zhao
88
1
0
11 Sep 2025
Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
Siratish Sakpiboonchit
64
0
0
10 Sep 2025
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Hashim Ali
Surya Subramani
Lekha Bollinani
Nithin Sai Adupa
Sali El-Loh
Hafiz Malik
101
0
0
28 Aug 2025
Preference Trajectory Modeling via Flow Matching for Sequential Recommendation
Preference Trajectory Modeling via Flow Matching for Sequential Recommendation
Li Li
Mingyue Cheng
Yuyang Ye
Zhiding Liu
Tong Xu
DiffMAI4TS
84
1
0
25 Aug 2025
MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
Xuwen Yang
80
0
0
20 Aug 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou
Jiawei Zhou
Karen Livescu
184
3
0
12 Aug 2025
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
Jaskaran Singh
Amartya Roy Chowdhury
Raghav Prabhakar
Varshul C. W
52
0
0
05 Aug 2025
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
Chengqian Ma
Wei Tao
Yiwen Guo
AuLLM
161
3
0
30 Jul 2025
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
Qibing Bai
Sho Inoue
Shuai Wang
Zhongjie Jiang
Yannan Wang
Haizhou Li
114
1
0
23 Jul 2025
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Duc Cao-Dinh
Khai Le-Duc
Anh Dao
Bach Phan Tat
Chris Ngo
Duy M. H. Nguyen
Nguyen X. Khanh
Thanh Nguyen-Tang
145
0
0
01 Jul 2025
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
Paige Tuttosi
H. H. Yeung
Yue Wang
J. Aucouturier
Angelica Lim
VLM
105
0
0
29 Jun 2025
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
Hyun Joon Park
Jeongmin Liu
Jin Sob Kim
Jeong Yeol Yang
Sung Won Han
Eunwoo Song
141
1
0
20 Jun 2025
EmojiVoice: Towards long-term controllable expressivity in robot speech
EmojiVoice: Towards long-term controllable expressivity in robot speech
Paige Tuttosi
Shivam Mehta
Zachary Syvenky
Bermet Burkanova
G. Henter
Angelica Lim
186
1
0
18 Jun 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark DataIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
206
2
0
18 Jun 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
251
8
0
16 Jun 2025
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
Neta Glazer
Aviv Navon
Yael Segal
Aviv Shamsian
Hilit Segev
Asaf Buchnick
Menachem Pirchi
Gil Hetz
Joseph Keshet
219
2
0
11 Jun 2025
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
Cheng-Kang Chou
Chan-Jan Hsu
Ho-Lam Chung
Liang-Hsuan Tseng
H. Cheng
Yu-Kuan Fu
Kuan Po Huang
Hung-yi Lee
357
1
0
10 Jun 2025
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama
Masaya Kawamura
Ryo Terashima
Ryuichi Yamamoto
Tomoki Toda
209
0
0
04 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
176
0
0
01 Jun 2025
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
Leying Zhang
Y. Qian
Xiaofei Wang
Manthan Thakker
Dongmei Wang
...
Haibin Wu
Yuxuan Hu
Jinyu Li
Yanmin Qian
Sheng Zhao
182
2
0
01 Jun 2025
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
Junqi Zhao
Jinzheng Zhao
Haohe Liu
Yun Chen
Lu Han
Xubo Liu
Mark D. Plumbley
Wenwu Wang
DiffM
206
2
0
28 May 2025
BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
Susan Liang
Dejan Marković
I. D. Gebru
Steven Krenn
Todd Keebler
Jacob Sandakly
Frank Yu
Samuel Hassel
Chenliang Xu
Alexander Richard
179
4
0
28 May 2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Jeongsoo Choi
Jaehun Kim
Joon Son Chung
142
0
0
27 May 2025
CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning
CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning
Renyuan Li
Zhibo Liang
Haichuan Zhang
Tianyu Shi
Zhiyuan Cheng
Jia Shi
Carl Yang
Mingjie Tang
AAML
249
2
0
25 May 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
274
44
0
23 May 2025
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
Yutong Liu
Ziyue Zhang
Ban Ma-bao
Yuqing Cai
Yongbin Yu
Renzeng Duojie
Xiangxiang Wang
Fan Gao
Cheng Huang
Nyima Tashi
217
3
0
20 May 2025
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Hieu-Nghia Huynh-Nguyen
Ngoc Son Nguyen
Huynh Nguyen Dang
Thieu Vo
Truong-Son Hy
Van Nguyen
251
3
0
19 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
Haozhe Zhang
...
Yichen Xiao
Yiying Zhou
Yujiao Shi
Yuan Lu
Yucen He
215
19
0
12 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
Anton Van Den Hengel
Yuankai Qi
Qingming Huang
894
1
0
02 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
361
1
0
29 Apr 2025
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified FlowIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Kaidi Wang
Wenhao Guan
Shenghui Lu
Jianglong Yao
Lin Li
Q. Hong
371
3
0
10 Apr 2025
Serenade: A Singing Style Conversion Framework Based On Audio Infilling
Serenade: A Singing Style Conversion Framework Based On Audio Infilling
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
168
1
0
16 Mar 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
269
0
0
14 Mar 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow MatchingAAAI Conference on Artificial Intelligence (AAAI), 2025
Wenxiang Guo
Yu Zhang
Changhao Pan
Rongjie Huang
Li Tang
Ruiqi Li
Zhiqing Hong
Yongqi Wang
Zhou Zhao
726
14
0
18 Feb 2025
12
Next