Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.12708
Cited By
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
22 May 2023
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer"
22 / 22 papers shown
Title
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
X. Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
51
0
0
21 Apr 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
Rui Liu
Shuwei He
Yifan Hu
H. Li
VLM
87
1
0
16 Dec 2024
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
130
2
0
14 Dec 2024
Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising
Gongfan Fang
Xinyin Ma
Xinchao Wang
DiffM
MoE
104
0
0
07 Dec 2024
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
Joseph Liu
Joshua Geddes
Ziyu Guo
Haomiao Jiang
Mahesh Kumar Nandwana
44
0
0
15 Nov 2024
The Zeno's Paradox of `Low-Resource' Languages
H. Nigatu
A. Tonja
Benjamin Rosman
Thamar Solorio
Monojit Choudhury
63
5
0
28 Oct 2024
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
Shuwei He
Rui Liu
H. Li
27
4
0
18 Oct 2024
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Huadai Liu
Jialei Wang
Rongjie Huang
Yang Liu
H. Lu
Wei Xue
Zhou Zhao
11
3
0
16 Oct 2024
FLUX that Plays Music
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
78
7
0
01 Sep 2024
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
Huadai Liu
Jialei Wang
Rongjie Huang
Yang Liu
Jiayang Xu
Zhou Zhao
21
4
0
18 Jul 2024
MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation
Zihao Wang
Haoxuan Liu
Jiaxing Yu
Tao Zhang
Yan Liu
K. Zhang
55
1
0
03 Jul 2024
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang
Shijian Deng
Jing Shi
Dimitrios Hatzinakos
Yapeng Tian
VGen
69
8
0
11 Jun 2024
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
Xinyin Ma
Gongfan Fang
Michael Bi Mi
Xinchao Wang
53
30
0
03 Jun 2024
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
Huadai Liu
Rongjie Huang
Yang Liu
Hengyuan Cao
Jialei Wang
Xize Cheng
Siqi Zheng
Zhou Zhao
63
8
0
01 Jun 2024
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
41
1
0
24 May 2024
On the Design Fundamentals of Diffusion Models: A Survey
Ziyi Chang
G. Koulieris
Hubert P. H. Shum
DiffM
27
50
0
07 Jun 2023
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Huadai Liu
Rongjie Huang
Jinzheng He
Gang Sun
Ran Shen
Xize Cheng
Zhou Zhao
19
3
0
21 May 2023
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
Fan Bao
Shen Nie
Kaiwen Xue
Chongxuan Li
Shiliang Pu
Yaole Wang
Gang Yue
Yue Cao
Hang Su
Jun Zhu
DiffM
199
147
0
12 Mar 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
140
304
0
30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,010
0
28 Jan 2022
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
N. Yuan
GAN
89
62
0
14 Oct 2021
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
262
10,183
0
12 Dec 2018
1