ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.06389
  4. Cited By
ProDiff: Progressive Fast Diffusion Model For High-Quality
  Text-to-Speech

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

13 July 2022
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
    DiffM
ArXivPDFHTML

Papers citing "ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech"

50 / 126 papers shown
Title
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
Yue Li
W. Liu
Dongdong Lin
39
0
0
29 Apr 2025
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
Zhicheng Lian
Lizhi Wang
Hua Huang
49
0
0
29 Apr 2025
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
K. Wang
Wenhao Guan
Shenghui Lu
Jianglong Yao
Lin Li
Q. Hong
22
0
0
10 Apr 2025
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng
Han Li
Wenrui Dai
Ziyang Zheng
Chenglin Li
Junni Zou
Hongkai Xiong
3DH
53
0
0
30 Mar 2025
Dual Audio-Centric Modality Coupling for Talking Head Generation
Dual Audio-Centric Modality Coupling for Talking Head Generation
Ao Fu
Ziqi Ni
Yi Zhou
29
1
0
26 Mar 2025
SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization
SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization
Xulin Fan
Heting Gao
Ziyi Chen
Peng Chang
Mei Han
Mark Hasegawa-Johnson
DiffM
45
0
0
17 Mar 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
A. Hengel
Yuankai Qi
62
2
0
15 Mar 2025
Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching
Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching
Yihan Wang
Fei Xiong
Zhexin Han
Qi Song
Kaiqiao Zhan
Ben Wang
29
0
0
28 Feb 2025
PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models
PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models
Xinwei Liu
X. Jia
Yuan Xun
Hua Zhang
Xiaochun Cao
DiffM
AAML
47
0
0
22 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
88
0
0
21 Feb 2025
FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation
FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation
Jaekwon Im
Juhan Nam
DiffM
38
0
0
18 Jan 2025
Text2Data: Low-Resource Data Generation with Textual Control
Text2Data: Low-Resource Data Generation with Textual Control
Shiyu Wang
Yihao Feng
Tian Lan
Ning Yu
Yu Bai
R. Xu
H. Wang
Caiming Xiong
S.
DiffM
80
0
0
03 Jan 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
Rui Liu
Shuwei He
Yifan Hu
H. Li
VLM
87
1
0
16 Dec 2024
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for
  Text-to-Speech with Diverse and Controllable Styles
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Jiaxuan Liu
Zhaoci Liu
Y. Hu
Yingying Gao
Shilei Zhang
Zhenhua Ling
DiffM
73
1
0
04 Dec 2024
Deepfake Media Generation and Detection in the Generative AI Era: A
  Survey and Outlook
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Florinel-Alin Croitoru
Andrei Iulian Hiji
Vlad Hondru
Nicolae-Cătălin Ristea
Paul Irofti
Marius Popescu
Cristian Rusu
Radu Tudor Ionescu
F. Khan
Mubarak Shah
79
2
0
29 Nov 2024
Multi-Source Spatial Knowledge Understanding for Immersive Visual
  Text-to-Speech
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
Shuwei He
Rui Liu
H. Li
19
4
0
18 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Anton Firc
K. Malinka
P. Hanáček
DiffM
26
0
0
09 Oct 2024
Schrödinger bridge based deep conditional generative learning
Schrödinger bridge based deep conditional generative learning
Hanwen Huang
DiffM
21
1
0
25 Sep 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffM
VLM
33
4
0
24 Sep 2024
DiffSSD: A Diffusion-Based Dataset For Speech Forensics
DiffSSD: A Diffusion-Based Dataset For Speech Forensics
Kratika Bhagtani
Amit Kumar Singh Yadav
Paolo Bestagini
Edward J. Delp
DiffM
16
1
0
19 Sep 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style
  Temporal Modeling in Text-to-Speech
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Xin Qi
Ruibo Fu
Zhengqi Wen
Tao Wang
Chunyu Qiang
...
Xiaopeng Wang
Yuankun Xie
Yukun Liu
Xuefei Liu
Guanjun Li
DiffM
13
0
0
18 Sep 2024
Online Diffusion-Based 3D Occupancy Prediction at the Frontier with
  Probabilistic Map Reconciliation
Online Diffusion-Based 3D Occupancy Prediction at the Frontier with Probabilistic Map Reconciliation
Alec Reed
Lorin Achey
Brendan Crowe
Bradley Hayes
Christoffer Heckman
18
0
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
4
0
16 Sep 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS
E1 TTS: Simple and Fast Non-Autoregressive TTS
Zhijun Liu
Shuai Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
VLM
DiffM
27
3
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
43
0
0
14 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
30
2
0
13 Sep 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis
  Vocoders
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
21
0
0
13 Aug 2024
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio
  Synthesis
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis
Weizhi Liu
Yue Li
Dongdong Lin
Hui Tian
Haizhou Li
WIGM
24
8
0
15 Jul 2024
LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
Zhenxiong Tan
Xinyin Ma
Gongfan Fang
Xinchao Wang
23
3
0
15 Jul 2024
An Unsupervised Domain Adaptation Method for Locating Manipulated Region
  in partially fake Audio
An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio
Siding Zeng
Jiangyan Yi
Jianhua Tao
Yujie Chen
Shan Liang
Yong Ren
Xiaohui Zhang
25
0
0
11 Jul 2024
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image
  Generation
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu
Yuhang Ma
Yang Zhen
Jun Dan
Yunlong Yu
Zeng Zhao
Zhipeng Hu
Bai Liu
Changjie Fan
VLM
DiffM
61
12
0
30 Jun 2024
DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis
  through Structure Guidance
DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance
Younghyun Kim
Geunmin Hwang
Junyu Zhang
Eunbyung Park
40
6
0
26 Jun 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
K. Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
25
3
0
12 Jun 2024
Flow Map Matching
Flow Map Matching
Nicholas M. Boffi
M. S. Albergo
Eric Vanden-Eijnden
25
4
0
11 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
32
15
0
08 Jun 2024
CoBL-Diffusion: Diffusion-Based Conditional Robot Planning in Dynamic
  Environments Using Control Barrier and Lyapunov Functions
CoBL-Diffusion: Diffusion-Based Conditional Robot Planning in Dynamic Environments Using Control Barrier and Lyapunov Functions
Kazuki Mizuta
Karen Leung
26
8
0
08 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
35
7
0
07 Jun 2024
ED-SAM: An Efficient Diffusion Sampling Approach to Domain
  Generalization in Vision-Language Foundation Models
ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
Thanh-Dat Truong
Xin Li
Bhiksha Raj
Jackson Cothren
Khoa Luu
DiffM
VLM
30
1
0
03 Jun 2024
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
Huadai Liu
Rongjie Huang
Yang Liu
Hengyuan Cao
Jialei Wang
Xize Cheng
Siqi Zheng
Zhou Zhao
57
8
0
01 Jun 2024
$\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human
  Pose Estimation
Di2Pose\text{Di}^2\text{Pose}Di2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation
Weiquan Wang
Jun Xiao
Chunping Wang
Wei Liu
Zhao Wang
Long Chen
DiffM
29
1
0
27 May 2024
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
Harry Zhang
Luca Carlone
3DH
64
1
0
27 May 2024
Diff-ETS: Learning a Diffusion Probabilistic Model for
  Electromyography-to-Speech Conversion
Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion
Zhao Ren
Kevin Scheck
Qinhan Hou
Stefano van Gogh
Michael Wand
Tanja Schultz
DiffM
26
0
0
11 May 2024
CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness
  Condition
CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
Jianzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
23
0
0
30 Apr 2024
CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with
  Contrastive Learning
CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Hao Wu
Jing Chen
Ruiying Du
Cong Wu
Kun He
Xingcan Shang
Hao Ren
Guowen Xu
AAML
29
7
0
24 Apr 2024
FairSSD: Understanding Bias in Synthetic Speech Detectors
FairSSD: Understanding Bias in Synthetic Speech Detectors
Amit Kumar Singh Yadav
Kratika Bhagtani
Davide Salvi
Paolo Bestagini
Edward J.Delp
16
5
0
17 Apr 2024
An Overview of Diffusion Models: Applications, Guided Generation,
  Statistical Rates and Optimization
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization
Minshuo Chen
Song Mei
Jianqing Fan
Mengdi Wang
VLM
MedIm
DiffM
32
48
0
11 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
23
1
0
10 Apr 2024
SceneSense: Diffusion Models for 3D Occupancy Synthesis from Partial
  Observation
SceneSense: Diffusion Models for 3D Occupancy Synthesis from Partial Observation
Alec Reed
Brendan Crowe
Doncey Albin
Lorin Achey
Bradley Hayes
Christoffer Heckman
DiffM
18
1
0
18 Mar 2024
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
Miri Varshavsky-Hassid
Roy Hirsch
Regev Cohen
Tomer Golany
Daniel Freedman
Ehud Rivlin
18
3
0
19 Feb 2024
Diffusion Models for Audio Restoration
Diffusion Models for Audio Restoration
Jean-Marie Lemercier
Julius Richter
Simon Welker
Eloi Moliner
Vesa Valimaki
Timo Gerkmann
20
13
0
15 Feb 2024
123
Next