ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.06072
  4. Cited By
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

12 August 2024
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
Jiazheng Xu
Yuanming Yang
Wenyi Hong
Xiaohan Zhang
Guanyu Feng
Da Yin
Yuxuan Zhang
Weihan Wang
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
    DiffM
    VGen
ArXivPDFHTML

Papers citing "CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer"

47 / 297 papers shown
Title
DimensionX: Create Any 3D and 4D Scenes from a Single Image with
  Controllable Video Diffusion
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Wenqiang Sun
Shuo Chen
F. Liu
Zilong Chen
Yueqi Duan
Jun Zhang
Yikai Wang
VGen
41
31
0
07 Nov 2024
Taming Rectified Flow for Inversion and Editing
Taming Rectified Flow for Inversion and Editing
Jiangshan Wang
Junfu Pu
Zhongang Qi
Jiayi Guo
Yue Ma
Nisha Huang
Yuxin Chen
Xiu Li
Ying Shan
36
22
0
07 Nov 2024
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for
  Image-to-Video Generation
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang
Y. Yang
VGen
26
3
0
05 Nov 2024
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive
  Parallelism
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Jiarui Fang
Jinzhe Pan
Xibo Sun
Aoyu Li
Jiannan Wang
51
4
0
04 Nov 2024
Optical Flow Representation Alignment Mamba Diffusion Model for Medical
  Video Generation
Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation
Zhenbin Wang
Lei Zhang
Lituan Wang
Minjuan Zhu
Zhenwei Zhang
VGen
MedIm
54
1
0
03 Nov 2024
GameGen-X: Interactive Open-world Game Video Generation
GameGen-X: Interactive Open-world Game Video Generation
Haoxuan Che
Xuanhua He
Quande Liu
C. Jin
Hao Chen
VGen
59
13
0
01 Nov 2024
MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis
MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis
Di Qiu
Zheng Chen
Rui Wang
Mingyuan Fan
Changqian Yu
Junshi Huan
Xiang Wen
VGen
19
6
0
28 Oct 2024
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Zongyi Li
Shujie Hu
Shujie Liu
Long Zhou
Jeongsoo Choi
Lingwei Meng
Xun Guo
J. Li
H. Ling
Furu Wei
VGen
DiffM
63
5
0
27 Oct 2024
Allegro: Open the Black Box of Commercial-Level Video Generation Model
Allegro: Open the Black Box of Commercial-Level Video Generation Model
Yuan Zhou
Qiuyue Wang
Yuxuan Cai
Huan Yang
VGen
VLM
69
23
0
20 Oct 2024
FrameBridge: Improving Image-to-Video Generation with Bridge Models
FrameBridge: Improving Image-to-Video Generation with Bridge Models
Yuji Wang
Zehua Chen
Xiaoyu Chen
Jun-Jie Zhu
Jianfei Chen
DiffM
VGen
69
1
0
20 Oct 2024
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise
  Motion Control
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Yujie Wei
Shiwei Zhang
Hangjie Yuan
Xiang Wang
Haonan Qiu
...
F. Liu
Zhizhong Huang
Jiaxin Ye
Yingya Zhang
Hongming Shan
DiffM
VGen
67
14
0
17 Oct 2024
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving
  Scene Representation
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Guosheng Zhao
Chaojun Ni
Xiaofeng Wang
Zheng Zhu
X. Zhang
...
Xinze Chen
Boyuan Wang
Youyi Zhang
Wenjun Mei
Xingang Wang
VGen
66
24
0
17 Oct 2024
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Jaehong Yoon
Shoubin Yu
Vaidehi Patil
Huaxiu Yao
Mohit Bansal
59
14
0
16 Oct 2024
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Ziyue Li
Tianyi Zhou
MoE
39
5
0
14 Oct 2024
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Aakanksha
Arash Ahmadian
Seraphina Goldfarb-Tarrant
B. Ermiş
Marzieh Fadaee
Sara Hooker
MoMe
55
4
0
14 Oct 2024
Enhancing JEPAs with Spatial Conditioning: Robust and Efficient
  Representation Learning
Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning
Etai Littwin
Vimal Thilak
Anand Gopalakrishnan
32
0
0
14 Oct 2024
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image
  Animation
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Jiahao Cui
Hui Li
Yao Yao
Hao Zhu
Hanlin Shang
Kaihui Cheng
Hang Zhou
Siyu Zhu
Jingdong Wang
DiffM
VGen
27
22
0
10 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
63
63
0
08 Oct 2024
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark
  for Video Generation
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Fanqing Meng
Jiaqi Liao
Xinyu Tan
Wenqi Shao
Quanfeng Lu
Kaipeng Zhang
Yu Cheng
Dianqi Li
Yu Qiao
Ping Luo
VGen
EGVM
19
23
0
07 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Mohit Bansal
Koustuv Sinha
AI4TS
49
2
0
04 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
66
18
0
03 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
47
2
0
02 Oct 2024
Replace Anyone in Videos
Replace Anyone in Videos
Xiang Wang
Shiwei Zhang
Haonan Qiu
Ruihang Chu
Zekun Li
Y. Zhang
Changxin Gao
Yuehuan Wang
Chunhua Shen
Nong Sang
VGen
DiffM
58
1
0
30 Sep 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
21
147
0
27 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
44
7
0
23 Sep 2024
Video-to-Audio Generation with Fine-grained Temporal Semantics
Video-to-Audio Generation with Fine-grained Temporal Semantics
Yuchen Hu
Yu Gu
Chenxing Li
Rilin Chen
Dong Yu
VGen
DiffM
16
1
0
23 Sep 2024
The Art of Storytelling: Multi-Agent Generative AI for Dynamic
  Multimodal Narratives
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives
Samee Arif
Taimoor Arif
Muhammad Saad Haroon
Aamina Jamal Khan
Agha Ali Raza
Awais Athar
14
0
0
17 Sep 2024
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
Jianbiao Mei
Yukai Ma
Xuemeng Yang
Licheng Wen
Tiantian Wei
Min Dou
Yukai Ma
Min Dou
Botian Shi
Yong Liu
DiffM
VGen
42
3
0
06 Sep 2024
CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention
CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention
Gaojie Lin
Jianwen Jiang
Chao Liang
Tianyun Zhong
Jiaqi Yang
Yanbo Zheng
VGen
DiffM
38
13
0
03 Sep 2024
FLUX that Plays Music
FLUX that Plays Music
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
74
7
0
01 Sep 2024
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
Joseph Cho
Samuel Schmidgall
C. Zakka
Mrudang Mathur
Dhamanpreet Kaur
R. Shad
W. Hiesinger
VGen
MedIm
24
5
0
26 Aug 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGen
DiffM
66
31
0
22 Aug 2024
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation
Liu He
Yizhi Song
Hejun Huang
Pinxin Liu
Yunlong Tang
Daniel G. Aliaga
Xin Zhou
DiffM
VGen
87
3
0
19 Aug 2024
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for
  Short Drama
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama
Jing Tang
Quanlu Jia
Yuqiang Xie
Zeyu Gong
Xiang Wen
Jiayi Zhang
Yalong Guo
Guibin Chen
Jiangping Yang
VGen
22
1
0
18 Aug 2024
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Zhenghao Zhang
Junchao Liao
Menghao Li
Zuozhuo Dai
Bingxue Qiu
Hao Hu
Shaowei Cai
Weizhi Wang
VGen
29
41
0
31 Jul 2024
Diffusion Feedback Helps CLIP See Better
Diffusion Feedback Helps CLIP See Better
Wenxuan Wang
Quan-Sen Sun
Fan Zhang
Yepeng Tang
Jing Liu
Xinlong Wang
VLM
35
6
0
29 Jul 2024
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
...
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
VGen
DiffM
62
41
0
17 Jul 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of
  Text-to-Time-lapse Video Generation
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVM
VGen
66
1
0
26 Jun 2024
Training-free Camera Control for Video Generation
Training-free Camera Control for Video Generation
Chen Hou
Guoqiang Wei
VGen
DiffM
45
29
0
14 Jun 2024
GenAI Arena: An Open Evaluation Platform for Generative Models
GenAI Arena: An Open Evaluation Platform for Generative Models
Dongfu Jiang
Max W.F. Ku
Tianle Li
Yuansheng Ni
Shizhuo Sun
Rongqi Fan
Wenhu Chen
EGVM
31
15
0
06 Jun 2024
VideoPhy: Evaluating Physical Commonsense for Video Generation
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal
Zongyu Lin
Tianyi Xie
Zeshun Zong
Michal Yarom
Yonatan Bitton
Chenfanfu Jiang
Yizhou Sun
Kai-Wei Chang
Aditya Grover
EGVM
VGen
21
36
0
05 Jun 2024
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Shenghai Yuan
Jinfa Huang
Yujun Shi
Yongqi Xu
Ruijie Zhu
Bin Lin
Xinhua Cheng
Li-xin Yuan
Jiebo Luo
VGen
67
31
0
07 Apr 2024
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video
  Diffusion Models
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
Wenhao Wang
Yi Yang
VGen
DiffM
24
30
0
10 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
67
177
0
29 Feb 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video
  Diffusion Models
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen
Yong Zhang
Xiaodong Cun
Menghan Xia
Xintao Wang
Chao-Liang Weng
Ying Shan
VGen
DiffM
115
269
0
17 Jan 2024
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
150
985
0
25 Nov 2023
CogVideo: Large-scale Pretraining for Text-to-Video Generation via
  Transformers
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
235
556
0
29 May 2022
Previous
123456