Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.09478
Cited By
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
19 December 2022
Ludan Ruan
Y. Ma
Huan Yang
Huiguo He
Bei Liu
Jianlong Fu
Nicholas Jing Yuan
Qin Jin
B. Guo
DiffM
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation"
22 / 22 papers shown
Title
PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model
Y.B. Wang
S.Z. Zhou
J.F. Wu
T. Hu
J.N. Zhang
Z. Li
Yanzhe Liu
DiffM
VGen
49
0
0
06 May 2025
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
X. Li
Chenming Wu
Zhao Yang
Zhihao Xu
Dingkang Liang
Y. Zhang
Ji Wan
J. Wang
VGen
67
1
0
22 Apr 2025
COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
Miguel Espinosa
V. Marsocci
Yuru Jia
Elliot J. Crowley
Mikolaj Czerkawski
DiffM
47
0
0
11 Apr 2025
Nested Annealed Training Scheme for Generative Adversarial Networks
Chang Wan
Ming-Hsuan Yang
Minglu Li
Yunliang Jiang
Zhonglong Zheng
GAN
33
0
0
20 Jan 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios
Yichen Xie
Chenfeng Xu
C-T.John Peng
Shuqi Zhao
Nhat Ho
Alexander T. Pham
Mingyu Ding
M. Tomizuka
W. Zhan
DiffM
31
2
0
02 Nov 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
63
4
0
26 Sep 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
43
6
0
13 Sep 2024
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
43
11
0
10 Jul 2024
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serra
28
2
0
08 Jul 2024
Context-aware Talking Face Video Generation
Meidai Xuanyuan
Yuwang Wang
Honglei Guo
Qionghai Dai
DiffM
27
0
0
28 Feb 2024
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
13
24
0
08 Nov 2023
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
19
5
0
13 Oct 2023
EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning
Kallol Saha
V. Mandadi
Jayaram Reddy
Ajit Srikanth
Aditya Agarwal
Bipasha Sen
Arun Singh
Madhava Krishna
DiffM
9
25
0
20 Sep 2023
MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text
Junchen Zhu
Huan Yang
Wenjing Wang
Huiguo He
Zixi Tuo
...
Wen-Huang Cheng
Lianli Gao
Jingkuan Song
Jianlong Fu
Jiebo Luo
DiffM
20
6
0
31 Jul 2023
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet
Zhihao Hu
Dong Xu
DiffM
VGen
6
64
0
26 Jul 2023
Soundini: Sound-Guided Diffusion for Natural Video Editing
Seung Hyun Lee
Si-Yeol Kim
Innfarn Yoo
Feng Yang
Donghyeon Cho
Youngseo Kim
Huiwen Chang
Jinkyu Kim
Sangpil Kim
VGen
DiffM
27
15
0
13 Apr 2023
Denoising Diffusion Restoration Models
Bahjat Kawar
Michael Elad
Stefano Ermon
Jiaming Song
DiffM
204
770
0
27 Jan 2022
RePaint: Inpainting using Denoising Diffusion Probabilistic Models
Andreas Lugmayr
Martin Danelljan
Andrés Romero
F. I. F. Richard Yu
Radu Timofte
Luc Van Gool
DiffM
211
1,330
0
24 Jan 2022
Palette: Image-to-Image Diffusion Models
Chitwan Saharia
William Chan
Huiwen Chang
Chris A. Lee
Jonathan Ho
Tim Salimans
David J. Fleet
Mohammad Norouzi
DiffM
VLM
325
1,570
0
10 Nov 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
Ruilong Li
Sha Yang
David A. Ross
Angjoo Kanazawa
ViT
201
467
0
21 Jan 2021
Sound2Sight: Generating Visual Dynamics from Sound and Context
A. Cherian
Moitreya Chatterjee
N. Ahuja
VGen
64
35
0
23 Jul 2020
1