Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

Interspeech (Interspeech), 2021

3 April 2021

Hyeongju Kim

Papers citing "Diff-TTS: A Denoising Diffusion Model for Text-to-Speech"

50 / 150 papers shown

Noise Aggregation Analysis Driven by Small-Noise Injection: Efficient Membership Inference for Diffusion Models

178

18 Oct 2025

An Octave-based Multi-Resolution CQT Architecture for Diffusion-based Audio Generation

Maurício do V. M. da Costa

Eloi Moliner

DiffM

229

20 Sep 2025

Length-Aware Rotary Position Embedding for Text-Speech Alignment

126

14 Sep 2025

DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration

179

11 Sep 2025

Navigating the Exploration-Exploitation Tradeoff in Inference-Time Scaling of Diffusion Models

239

17 Aug 2025

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching

244

20 Jun 2025

Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation

326

10 Jun 2025

ZeroSep: Separate Anything in Audio with Zero Training

333

29 May 2025

CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning

423

25 May 2025

Constraint-Aware Diffusion Guidance for Robotics: Real-Time Obstacle Avoidance for Autonomous Racing

247

19 May 2025

VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning

378

18 May 2025

Language translation, and change of accent for speech-to-speech task using diffusion model

247

04 May 2025

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

...

423

14 Apr 2025

SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified FlowIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

497

10 Apr 2025

SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System

531

29 Mar 2025

Dual Audio-Centric Modality Coupling for Talking Head Generation

Ao Fu

Ziqi Ni

Yi Zhou

365

26 Mar 2025

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

406

14 Mar 2025

AudioX: A Unified Framework for Anything-to-Audio Generation

Yike Guo

575

13 Mar 2025

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

892

07 Feb 2025

UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024

680

16 Dec 2024

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable StylesInternational Conference on Computational Linguistics (COLING), 2024

295

04 Dec 2024

A roadmap for generative mapping: unlocking the power of generative AI for map-making

122

21 Oct 2024

Generative Co-Learners: Enhancing Cognitive and Social Presence of Students in Asynchronous Learning with Generative AI

Yan Chen

213

06 Oct 2024

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Tao Wang

...

Xiaopeng Wang

Yuankun Xie

Yukun Liu

Zhengqi Wen

Guanjun Li

DiffM

350

18 Sep 2024

Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant GenerationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

1.1K

14 Sep 2024

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake DatasetSpoken Language Technology Workshop (SLT), 2024

Jiawei Du

I-Ming Lin

I-Hsiang Chiu

Xuanjun Chen

Haibin Wu

Wenze Ren

Yu Tsao

Hung-yi Lee

Jyh-Shing Roger Jang

DiffM

305

13 Sep 2024

A Simple Early Exiting Framework for Accelerated Sampling in Diffusion ModelsInternational Conference on Machine Learning (ICML), 2024

Moonseok Choi

Juho Lee

285

12 Aug 2024

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training

Hawraz A. Ahmad

Tarik A. Rashid

293

06 Aug 2024

Piecewise deterministic generative modelsNeural Information Processing Systems (NeurIPS), 2024

247

28 Jul 2024

SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow

Yuanzhi Zhu

Xingchao Liu

Qiang Liu

349

17 Jul 2024

LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

Zhenxiong Tan

Xinyin Ma

Gongfan Fang

Xinchao Wang

312

15 Jul 2024

Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling

333

11 Jul 2024

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

Sung Won Han

217

27 Jun 2024

Flow map matching with stochastic interpolants: A mathematical framework for consistency models

Nicholas M. Boffi

M. S. Albergo

Eric Vanden-Eijnden

277

11 Jun 2024

MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion GuidanceInterspeech (Interspeech), 2024

Nam Soo Kim

345

10 Jun 2024

Convergence of the denoising diffusion probabilistic models for general noise schedules

Yumiharu Nakano

DiffM

699

03 Jun 2024

A Survey of Deep Learning Audio Generation Methods

Matej Bozic

Marko Horvat

VLM MedIm

350

31 May 2024

Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

318

11 May 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Zhen Ye

Xu Tan

...

Wei Xue

366

23 Apr 2024

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

Mengdi Wang

415

11 Apr 2024

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker ConversationsNeural Information Processing Systems (NeurIPS), 2024

...

305

10 Apr 2024

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

181

31 Mar 2024

GetMesh: A Controllable Model for High-quality Mesh Generation and Manipulation

Ben Fei

Weidong Yang

219

18 Mar 2024

Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon

383

14 Mar 2024

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-SpeechInternational Conference on Computer Supported Cooperative Work in Design (CSCWD), 2024

313

13 Mar 2024

An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation

191

26 Feb 2024

On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models

Miri Varshavsky-Hassid

275

19 Feb 2024

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

Xiangyu Zhang

330

16 Feb 2024

Classification Diffusion Models: Revitalizing Density Ratio Estimation

314

15 Feb 2024

Diff-RNTraj: A Structure-aware Diffusion Model for Road Network-constrained Trajectory GenerationIEEE Transactions on Knowledge and Data Engineering (TKDE), 2024

Tonglong Wei

Yan Lin

225

12 Feb 2024