Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.06072
Cited By
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
12 August 2024
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
Jiazheng Xu
Yuanming Yang
Wenyi Hong
Xiaohan Zhang
Guanyu Feng
Da Yin
Yuxuan Zhang
Weihan Wang
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer"
50 / 297 papers shown
Title
Wonderland: Navigating 3D Scenes from a Single Image
Hanwen Liang
Junli Cao
Vidit Goel
Guocheng Qian
Sergei Korolev
Demetri Terzopoulos
Konstantinos N. Plataniotis
Sergey Tulyakov
Jian Ren
VGen
125
11
0
16 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGen
AI4CE
88
3
0
16 Dec 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Zhao Jin
Dacheng Tao
VGen
97
1
0
16 Dec 2024
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
95
2
0
14 Dec 2024
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu
Zhixing Zhang
Yanyu Li
Yanwu Xu
Anil Kag
...
Ju Hu
Dimitris N. Metaxas
Yanzhi Wang
Sergey Tulyakov
Jian Ren
DiffM
VGen
87
2
0
13 Dec 2024
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTD
DiffM
VGen
VLM
81
5
0
13 Dec 2024
Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
Jun Zheng
Jing Wang
Fuwei Zhao
Xujie Zhang
Xiaodan Liang
DiffM
VGen
70
0
0
13 Dec 2024
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu
Shiwei Zhang
Yujie Wei
Ruihang Chu
Hangjie Yuan
X. Wang
Y. Zhang
Ziwei Liu
89
4
0
12 Dec 2024
Owl-1: Omni World Model for Consistent Long Video Generation
Yuanhui Huang
Wenzhao Zheng
Yuan Gao
Xin Tao
Pengfei Wan
Di Zhang
Jie Zhou
Jiwen Lu
VGen
VLM
75
0
0
12 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip H. S. Torr
VLM
ObjD
100
0
0
12 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He
Shuohang Wang
Jianwei Yang
Xiaoxia Wu
Y. Wang
Kuan-Chieh Jackson Wang
Z. Zhan
Olatunji Ruwase
Yelong Shen
X. Wang
VGen
79
1
0
12 Dec 2024
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Xi Chen
Zhifei Zhang
He Zhang
Yuqian Zhou
S. Kim
...
Nanxuan Zhao
Yilin Wang
Hui Ding
Zhe Lin
Hengshuang Zhao
VGen
DiffM
121
10
0
10 Dec 2024
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Tianwei Yin
Qiang Zhang
Richard Zhang
William T. Freeman
F. Durand
Eli Shechtman
Xun Huang
VGen
DiffM
74
4
0
10 Dec 2024
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang
Peiye Zhuang
Tuan Duc Ngo
Willi Menapace
Aliaksandr Siarohin
Michael Vasilkovsky
Ivan Skorokhodov
Sergey Tulyakov
Peter Wonka
Hsin-Ying Lee
DiffM
VGen
87
3
0
05 Dec 2024
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models
Yizhuo Li
Yuying Ge
Yixiao Ge
Ping Luo
Ying Shan
DiffM
VGen
90
0
0
05 Dec 2024
PaintScene4D: Consistent 4D Scene Generation from Text Prompts
Vinayak Gupta
Yunze Man
Yu-Xiong Wang
VGen
77
0
0
05 Dec 2024
MV-Adapter: Multi-view Consistent Image Generation Made Easy
Zehuan Huang
Y. Guo
Haoran Wang
Ran Yi
Lizhuang Ma
Yan-Pei Cao
Lu Sheng
107
5
0
04 Dec 2024
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
Hannan Lu
Xiaohe Wu
Shudong Wang
Xiameng Qin
Xinyu Zhang
Junyu Han
W. Zuo
Ji Tao
76
0
0
04 Dec 2024
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
Haohe Liu
Gaël Le Lan
Xinhao Mei
Zhaoheng Ni
Anurag Kumar
Varun K. Nagaraja
Wenwu Wang
Mark D. Plumbley
Yangyang Shi
Vikas Chandra
VGen
59
1
0
03 Dec 2024
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation
Kefan Chen
Chaerin Min
Linguang Zhang
Shreyas Hampali
Cem Keskin
Srinath Sridhar
75
0
0
03 Dec 2024
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance
Viet-Anh Nguyen
A. Nguyen
T. Dao
K. Nguyen
Cuong Pham
Toan M. Tran
Anh Tran
DiffM
63
0
0
03 Dec 2024
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
Yuelei Wang
Jian Zhang
Pengtao Jiang
H. Zhang
Jinwei Chen
Bo Li
VGen
DiffM
105
2
0
02 Dec 2024
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Jiahao Cui
Hui Li
Yun Zhan
Hanlin Shang
K. Cheng
Yuqi Ma
Shan Mu
Hang Zhou
Jingdong Wang
Siyu Zhu
ViT
VGen
87
6
0
01 Dec 2024
Human Action CLIPS: Detecting AI-generated Human Motion
Matyáš Boháček
Hany Farid
63
1
0
30 Nov 2024
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang
Bin Zhu
Bin Lin
Mingzhe Zheng
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
VGen
3DH
73
2
0
30 Nov 2024
Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation
Tianshuo Xu
Zhifei Chen
Leyi Wu
Hao Lu
Yuying Chen
Lihui Jiang
Bingbing Liu
Yingcong Chen
VGen
70
0
0
30 Nov 2024
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
Qiyao Xue
Xiangyu Yin
Boyuan Yang
Wei Gao
DiffM
VGen
72
9
0
30 Nov 2024
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
Chaojun Ni
Guosheng Zhao
Xiaofeng Wang
Zheng Hua Zhu
Wenkang Qin
...
Kun Zhan
Peng Jia
Xianpeng Lang
Xingang Wang
Wenjun Mei
VGen
86
6
0
29 Nov 2024
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin
Yunyang Ge
Xinhua Cheng
Zongjian Li
Bin Zhu
...
Zhang Pan
Xing Zhou
Shaoling Dong
Yonghong Tian
Li-xin Yuan
VLM
VGen
113
58
0
28 Nov 2024
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGen
DiffM
92
3
0
28 Nov 2024
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
Hui Li
Mingwang Xu
Yun Zhan
Shan Mu
Jiaye Li
...
Y. Chen
Tan Chen
Mao Ye
Jingdong Wang
Siyu Zhu
VGen
99
2
0
28 Nov 2024
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Feng Liu
Shiwei Zhang
Xiaofeng Wang
Yujie Wei
Haonan Qiu
Yuzhong Zhao
Yingya Zhang
Qixiang Ye
Fang Wan
VGen
AI4TS
90
11
0
28 Nov 2024
MatchDiffusion: Training-free Generation of Match-cuts
Alejandro Pardo
Fabio Pizzati
Tong Zhang
Alexander Pondaven
Philip H. S. Torr
Juan C. Pérez
Bernard Ghanem
DiffM
VGen
68
1
0
27 Nov 2024
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Shengqu Cai
Eric Ryan Chan
Yunzhi Zhang
Leonidas J. Guibas
Jiajun Wu
Gordon Wetzstein
69
8
0
27 Nov 2024
Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models
Yiming Wu
Huan Wang
Zhenghao Chen
Dong Xu
DiffM
VGen
64
1
0
27 Nov 2024
MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
Haopeng Fang
Di Qiu
Binjie Mao
Pengfei Yan
He Tang
VGen
DiffM
68
4
0
27 Nov 2024
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li
Bin Lin
Yang Ye
Liuhan Chen
Xinhua Cheng
Shenghai Yuan
Li-xin Yuan
VGen
DiffM
104
16
0
26 Nov 2024
Generative Omnimatte: Learning to Decompose Video into Layers
Yao-Chih Lee
Erika Lu
Sarah Rumbley
Michal Geyer
Jia-Bin Huang
Tali Dekel
Forrester Cole
DiffM
VGen
86
4
0
25 Nov 2024
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
Jiahao Hu
Tianxiong Zhong
Xuebo Wang
Boyuan Jiang
Xingye Tian
Fei Yang
Pengfei Wan
Di Zhang
VGen
62
2
0
22 Nov 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
111
1
0
22 Nov 2024
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
Sundar Sripada V. S.
Minkyu Choi
Sahil Shah
Harsh Goel
Mohammad Omama
Sandeep P. Chinchali
EGVM
105
2
0
22 Nov 2024
PhysFlow: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
Zhuoman Liu
Weicai Ye
Yan Luximon
Pengfei Wan
Di Zhang
VGen
AI4CE
89
2
0
21 Nov 2024
Generating 3D-Consistent Videos from Unposed Internet Photos
Gene Chou
Kai Zhang
Sai Bi
Hao Tan
Zexiang Xu
Fujun Luan
Bharath Hariharan
Noah Snavely
3DGS
VGen
66
3
0
20 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
M. Gong
Tongliang Liu
92
6
0
18 Nov 2024
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang
Haofeng Huang
Pengle Zhang
Jia wei
Jun-Jie Zhu
Jianfei Chen
VLM
MQ
47
15
0
17 Nov 2024
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Xuannan Liu
Xing Cui
Peipei Li
Zekun Li
Huaibo Huang
Shuhan Xia
Miaoxuan Zhang
Yueying Zou
Ran He
AAML
51
4
0
14 Nov 2024
Artificial Intelligence for Biomedical Video Generation
Linyuan Li
Jianing Qiu
Anujit Saha
Lin Li
Poyuan Li
Mengxian He
Ziyu Guo
Wu Yuan
VGen
55
1
0
12 Nov 2024
Grounding Video Models to Actions through Goal Conditioned Exploration
Yunhao Luo
Yilun Du
LM&Ro
VGen
71
1
0
11 Nov 2024
Improved Video VAE for Latent Video Diffusion Model
Pingyu Wu
Kai Zhu
Yu Liu
Liming Zhao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
VGen
DiffM
47
4
0
10 Nov 2024
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang
Roni Paiss
Shiran Zada
Nikhil Karnad
David E. Jacobs
Yael Pritch
Inbar Mosseri
Mike Zheng Shou
Neal Wadhwa
Nataniel Ruiz
DiffM
VGen
66
14
0
07 Nov 2024
Previous
1
2
3
4
5
6
Next