STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

13 September 2024

Yong Ren

Chenxing Li

Yu Gu

Rilin Chen

Dong Yu

Papers citing "STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment"

4 / 4 papers shown

Title
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation Haomin Zhang Chang Liu Junjie Zheng Zihao Chen Chaofan Ding Xinhan Di DiffM VGen 83 0 0 28 Mar 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Ho Kei Cheng Masato Ishii Akio Hayakawa Takashi Shibuya A. Schwing Yuki Mitsufuji VGen 120 12 0 19 Dec 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization Rohan Choudhury Guanglei Zhu Sihan Liu Koichiro Niinuma Kris M. Kitani László A. Jeni 26 9 0 07 Nov 2024
Video and Text Matching with Conditioned Embeddings Ameen Ali Idan Schwartz Tamir Hazan Lior Wolf 41 13 0 21 Oct 2021