ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.12708
  4. Cited By
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
v1v2 (latest)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
22 May 2023
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
    DiffM
ArXiv (abs)PDFHTML

Papers citing "ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer"

12 / 12 papers shown
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
Huanpeng Chu
Wei Wu
Guanyu Fen
Yutao Zhang
DiffM
240
6
0
22 Aug 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Huadai Liu
Kaicheng Luo
Jialei Wang
Wen Wang
Qian Chen
Zhou Zhao
Wei Xue
VGenLRM
498
17
0
26 Jun 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
534
13
0
21 Apr 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
Guanrou Yang
Chen Yang
Qian Chen
Ziyang Ma
Wenxi Chen
...
Fan Yu
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
532
32
0
17 Apr 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-SpeechAAAI Conference on Artificial Intelligence (AAAI), 2024
Rui Liu
Shuwei He
Yifan Hu
Hong Li
VLM
479
6
0
16 Dec 2024
Video Diffusion Transformers are In-Context Learners
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGenDiffM
925
7
0
14 Dec 2024
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
Joseph Liu
Joshua Geddes
Ziyu Guo
Haomiao Jiang
Xiao Yu
358
9
0
15 Nov 2024
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Huadai Liu
Jialei Wang
Rongjie Huang
Yang Liu
H. Lu
Zhou Zhao
Wei Xue
352
12
0
16 Oct 2024
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
Huadai Liu
Jialei Wang
X. Li
Wen Wang
Qian Chen
Rongjie Huang
Yang Liu
Jiayang Xu
Zhou Zhao
305
9
0
18 Jul 2024
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
424
8
0
24 May 2024
On the Design Fundamentals of Diffusion Models: A Survey
On the Design Fundamentals of Diffusion Models: A SurveyPattern Recognition (Pattern Recogn.), 2023
Ziyi Chang
George Alex Koulieris
Hyung Jin Chang
Hubert P. H. Shum
DiffM
671
84
0
07 Jun 2023
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Wav2SQL: Direct Generalizable Speech-To-SQL ParsingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Huadai Liu
Rongjie Huang
Jinzheng He
Gang Sun
Ran Shen
Xize Cheng
Zhou Zhao
223
6
0
21 May 2023
1
Page 1 of 1