ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.03048
  4. Cited By
Latte: Latent Diffusion Transformer for Video Generation

Latte: Latent Diffusion Transformer for Video Generation

5 January 2024
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Z. Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
    DiffM
    VGen
ArXivPDFHTML

Papers citing "Latte: Latent Diffusion Transformer for Video Generation"

50 / 186 papers shown
Title
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise
  Motion Control
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Yujie Wei
Shiwei Zhang
Hangjie Yuan
Xiang Wang
Haonan Qiu
...
F. Liu
Zhizhong Huang
Jiaxin Ye
Yingya Zhang
Hongming Shan
DiffM
VGen
35
1
0
17 Oct 2024
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving
  Scene Representation
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Guosheng Zhao
Chaojun Ni
Xiaofeng Wang
Zheng Zhu
X. Zhang
...
Xinze Chen
Boyuan Wang
Youyi Zhang
Wenjun Mei
Xingang Wang
VGen
35
20
0
17 Oct 2024
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Aakanksha
Arash Ahmadian
Seraphina Goldfarb-Tarrant
B. Ermiş
Marzieh Fadaee
Sara Hooker
MoMe
29
10
0
14 Oct 2024
FasterDiT: Towards Faster Diffusion Transformers Training without
  Architecture Modification
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
J. Yao
Wang Cheng
Wenyu Liu
Xinggang Wang
20
1
0
14 Oct 2024
Progressive Autoregressive Video Diffusion Models
Progressive Autoregressive Video Diffusion Models
Desai Xie
Zhan Xu
Yicong Hong
Hao Tan
Difan Liu
Feng Liu
Arie E. Kaufman
Yang Zhou
VGen
DiffM
27
1
0
10 Oct 2024
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang
Yukai Shi
Jiarong Ou
R. J. Chen
Ke Lin
...
Mingwu Zheng
Xin Tao
Fei Yang
Pengfei Wan
Di Zhang
VGen
60
16
0
10 Oct 2024
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large
  Vision-Language Models
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Rui Zhao
Hangjie Yuan
Yujie Wei
Shiwei Zhang
Yuchao Gu
...
Xiang Wang
Zhangjie Wu
Junhao Zhang
Yingya Zhang
Mike Zheng Shou
DiffM
VLM
23
2
0
09 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
34
52
0
09 Oct 2024
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge
  for Robot Manipulation
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
Chi-Lam Cheang
Guangzeng Chen
Ya Jing
Tao Kong
Hang Li
...
Hongtao Wu
Jiafeng Xu
Yichu Yang
Hanbo Zhang
Minzhao Zhu
VGen
LM&Ro
28
1
0
08 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
31
1
0
08 Oct 2024
GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting
GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting
Yukang Cao
Masoud Hadi
Liang Pan
Ziwei Liu
3DGS
DiffM
35
1
0
07 Oct 2024
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video
  Synthesis
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis
Shitong Shao
Zikai Zhou
Lichen Bai
Haoyi Xiong
Zeke Xie
VGen
20
1
0
05 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Mohit Bansal
Koustuv Sinha
AI4TS
34
2
0
04 Oct 2024
Dynamic Diffusion Transformer
Dynamic Diffusion Transformer
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Yibing Song
Gao Huang
Fan Wang
Yang You
30
1
0
04 Oct 2024
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep
  Approach
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
Yaofang Liu
Y. Ren
Xiaodong Cun
Aitor Artola
Yang Liu
Tieyong Zeng
Raymond H. Chan
Jean-Michel Morel
VGen
DiffM
23
1
0
04 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
29
1
0
02 Oct 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
25
1
0
23 Sep 2024
FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video
  Dataset
FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset
Donglin Di
H. Feng
Wenzhang Sun
Yongjia Ma
Hao Li
Wei Chen
Xiaofei Gou
Tonghua Su
Xun Yang
CVBM
25
1
0
23 Sep 2024
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
Zhitong Huang
Mohan Zhang
Jing Liao
DiffM
VGen
31
1
0
19 Sep 2024
AMG: Avatar Motion Guided Video Generation
AMG: Avatar Motion Guided Video Generation
Zhangsihao Yang
Mengyi Shan
Mohammad Farazi
Wenhui Zhu
Yanxi Chen
Xuanzhao Dong
Yalin Wang
VGen
DiffM
37
0
0
02 Sep 2024
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video
  Diffusion Model
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Liuhan Chen
Zongjian Li
Bin Lin
Bin Zhu
Qian Wang
Shenghai Yuan
X. Zhou
Xinhua Cheng
Li Yuan
DiffM
52
1
0
02 Sep 2024
FLUX that Plays Music
FLUX that Plays Music
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
43
6
0
01 Sep 2024
Training-free Long Video Generation with Chain of Diffusion Model
  Experts
Training-free Long Video Generation with Chain of Diffusion Model Experts
Wenhao Li
Yichao Cao
Xiu Su
Xi Lin
Shan You
Mingkai Zheng
Yi Chen
Chang Xu
VGen
DiffM
28
1
0
24 Aug 2024
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed
  Representations
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Can Qin
Congying Xia
Krithika Ramakrishnan
Michael S Ryoo
Lifu Tu
...
Silvio Savarese
Juan Carlos Niebles
Zeyuan Chen
Ran Xu
Caiming Xiong
VGen
DiffM
42
1
0
22 Aug 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGen
DiffM
39
1
0
22 Aug 2024
Factorized-Dreamer: Training A High-Quality Video Generator with Limited
  and Low-Quality Data
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
Tao Yang
Yangming Shi
Yunwen Huang
Feng Chen
Yin Zheng
Lei Zhang
DiffM
VGen
33
1
0
19 Aug 2024
Efficient Diffusion Transformer with Step-wise Dynamic Attention
  Mediators
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu
Zhuofan Xia
Jiayi Guo
Dongchen Han
Qixiu Li
...
Ji Li
Yizeng Han
Shiji Song
Gao Huang
Xiu Li
29
1
0
11 Aug 2024
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture
  Generation
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
Xiaofeng Mao
Zhengkai Jiang
Qilin Wang
Chencan Fu
Jiangning Zhang
Jiafu Wu
Yabiao Wang
Chengjie Wang
Wei Li
Mingmin Chi
41
1
0
06 Aug 2024
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Zhiyu Tan
Xiaomeng Yang
Luozheng Qin
Hao Li
VGen
29
16
0
05 Aug 2024
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's
  Impact on Spatio-Temporal Cross-Attentions
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions
Ashkan Taghipour
Morteza Ghahremani
Bennamoun
Aref Miri Rekavandi
Zinuo Li
Hamid Laga
F. Boussaïd
VGen
40
0
0
27 Jul 2024
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
...
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
VGen
DiffM
31
1
0
17 Jul 2024
Scaling Diffusion Transformers to 16 Billion Parameters
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
DiffM
MoE
38
1
0
16 Jul 2024
A Comprehensive Survey on Human Video Generation: Challenges, Methods,
  and Insights
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
Wentao Lei
Jinting Wang
Fengji Ma
Guanjie Huang
Li Liu
VGen
EGVM
31
2
0
11 Jul 2024
MiraData: A Large-Scale Video Dataset with Long Durations and Structured
  Captions
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Xuan Ju
Yiming Gao
Zhaoyang Zhang
Ziyang Yuan
Xintao Wang
Ailing Zeng
Yu Xiong
Qiang Xu
Ying Shan
VGen
35
1
0
08 Jul 2024
VIMI: Grounding Video Generation through Multi-modal Instruction
VIMI: Grounding Video Generation through Multi-modal Instruction
Yuwei Fang
Willi Menapace
Aliaksandr Siarohin
Tsai-Shien Chen
Kuan-Chien Wang
Ivan Skorokhodov
Graham Neubig
Sergey Tulyakov
VGen
31
2
0
08 Jul 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
44
1
0
02 Jul 2024
MimicMotion: High-Quality Human Motion Video Generation with
  Confidence-aware Pose Guidance
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Yuang Zhang
Jiaxi Gu
Li-Wen Wang
Han Wang
Junqi Cheng
Yuefeng Zhu
Fangyuan Zou
VGen
31
1
0
28 Jun 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of
  Text-to-Time-lapse Video Generation
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVM
VGen
38
1
0
26 Jun 2024
Diffusion Model-Based Video Editing: A Survey
Diffusion Model-Based Video Editing: A Survey
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Dacheng Tao
VGen
34
1
0
26 Jun 2024
MotionBooth: Motion-Aware Customized Text-to-Video Generation
MotionBooth: Motion-Aware Customized Text-to-Video Generation
Jianzong Wu
Xiangtai Li
Yanhong Zeng
J. J. Zhang
Qianyu Zhou
Yining Li
Yunhai Tong
Kai Chen
DiffM
VGen
40
1
0
25 Jun 2024
MVOC: a training-free multiple video object composition method with
  diffusion models
MVOC: a training-free multiple video object composition method with diffusion models
Wei Wang
Yaosen Chen
Yuegen Liu
Qi Yuan
Shubin Yang
Yanru Zhang
DiffM
37
1
0
22 Jun 2024
Identifying and Solving Conditional Image Leakage in Image-to-Video
  Diffusion Model
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
Min Zhao
Hongzhou Zhu
Chendong Xiang
Kaiwen Zheng
Chongxuan Li
Jun Zhu
32
7
0
22 Jun 2024
IRASim: Learning Interactive Real-Robot Action Simulators
IRASim: Learning Interactive Real-Robot Action Simulators
Fangqi Zhu
Hongtao Wu
Song Guo
Yuxiao Liu
Chilam Cheang
Tao Kong
40
2
0
20 Jun 2024
Neural Residual Diffusion Models for Deep Scalable Vision Generation
Neural Residual Diffusion Models for Deep Scalable Vision Generation
Zhiyuan Ma
Liangliang Zhao
Biqing Qi
Bowen Zhou
DiffM
32
1
0
19 Jun 2024
ViD-GPT: Introducing GPT-style Autoregressive Generation in Video
  Diffusion Models
ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Kaifeng Gao
Jiaxin Shi
Hanwang Zhang
Chunping Wang
Jun Xiao
DiffM
VGen
36
1
0
16 Jun 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Junke Wang
Yi-Xin Jiang
Zehuan Yuan
Binyue Peng
Zuxuan Wu
Yu-Gang Jiang
ViT
VGen
45
1
0
13 Jun 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing
  Reliability,Reproducibility, and Practicality
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Tianle Zhang
Langtian Ma
Yuchen Yan
Yuchen Zhang
Kai Wang
...
Wenqi Shao
Yang You
Yu Qiao
Ping Luo
Kaipeng Zhang
VGen
34
1
0
13 Jun 2024
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
  Video Generation
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang
Shijian Deng
Jing Shi
Dimitrios Hatzinakos
Yapeng Tian
VGen
33
8
0
11 Jun 2024
Compositional Video Generation as Flow Equalization
Compositional Video Generation as Flow Equalization
Xingyi Yang
Xinchao Wang
DiffM
VGen
27
1
0
10 Jun 2024
Vript: A Video Is Worth Thousands of Words
Vript: A Video Is Worth Thousands of Words
Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao
VGen
31
1
0
10 Jun 2024
Previous
1234
Next