ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.12708
  4. Cited By
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
v1v2 (latest)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
22 May 2023
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
    DiffM
ArXiv (abs)PDFHTML

Papers citing "ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer"

12 / 12 papers shown
Title
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
Huanpeng Chu
Wei Wu
Guanyu Fen
Yutao Zhang
DiffM
136
43
0
22 Aug 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Huadai Liu
Kaicheng Luo
Jialei Wang
Wen Wang
Qian Chen
Zhou Zhao
Wei Xue
VGenLRM
297
13
0
26 Jun 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
372
10
0
21 Apr 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
Guanrou Yang
Chen Yang
Qian Chen
Ziyang Ma
Wenxi Chen
...
Fan Yu
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
446
20
0
17 Apr 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-SpeechAAAI Conference on Artificial Intelligence (AAAI), 2024
Rui Liu
Shuwei He
Yifan Hu
Hong Li
VLM
383
5
0
16 Dec 2024
Video Diffusion Transformers are In-Context Learners
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGenDiffM
775
7
0
14 Dec 2024
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
Joseph Liu
Joshua Geddes
Ziyu Guo
Haomiao Jiang
Xiao Yu
300
4
0
15 Nov 2024
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Huadai Liu
Jialei Wang
Rongjie Huang
Yang Liu
H. Lu
Zhou Zhao
Wei Xue
265
11
0
16 Oct 2024
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
Huadai Liu
Jialei Wang
X. Li
Wen Wang
Qian Chen
Rongjie Huang
Yang Liu
Jiayang Xu
Zhou Zhao
182
9
0
18 Jul 2024
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
311
10
0
24 May 2024
On the Design Fundamentals of Diffusion Models: A Survey
On the Design Fundamentals of Diffusion Models: A SurveyPattern Recognition (Pattern Recogn.), 2023
Ziyi Chang
George Alex Koulieris
Hyung Jin Chang
Hubert P. H. Shum
DiffM
485
76
0
07 Jun 2023
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Wav2SQL: Direct Generalizable Speech-To-SQL ParsingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Huadai Liu
Rongjie Huang
Jinzheng He
Gang Sun
Ran Shen
Xize Cheng
Zhou Zhao
198
5
0
21 May 2023
1