ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.06304
  4. Cited By
VIMI: Grounding Video Generation through Multi-modal Instruction

VIMI: Grounding Video Generation through Multi-modal Instruction

8 July 2024
Yuwei Fang
Willi Menapace
Aliaksandr Siarohin
Tsai-Shien Chen
Kuan-Chien Wang
Ivan Skorokhodov
Graham Neubig
Sergey Tulyakov
    VGen
ArXivPDFHTML

Papers citing "VIMI: Grounding Video Generation through Multi-modal Instruction"

9 / 9 papers shown
Title
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
52
49
0
29 Feb 2024
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Weiming Ren
Harry Yang
Ge Zhang
Cong Wei
Xinrun Du
Stephen W. Huang
Wenhu Chen
DiffM
VGen
54
21
0
06 Feb 2024
Lumiere: A Space-Time Diffusion Model for Video Generation
Lumiere: A Space-Time Diffusion Model for Video Generation
Omer Bar-Tal
Hila Chefer
Omer Tov
Charles Herrmann
Roni Paiss
...
T. Michaeli
Oliver Wang
Deqing Sun
Tali Dekel
Inbar Mosseri
VGen
82
90
0
23 Jan 2024
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
142
321
0
25 Nov 2023
Control-A-Video: Controllable Text-to-Video Generation with Diffusion
  Models
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
Weifeng Chen
Yatai Ji
Jie Wu
Hefeng Wu
Pan Xie
Jiashi Li
Xin Xia
Xuefeng Xiao
Liang Lin
VGen
102
90
0
23 May 2023
VideoFusion: Decomposed Diffusion Models for High-Quality Video
  Generation
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
Zhengxiong Luo
Dayou Chen
Yingya Zhang
Yan Huang
Liangsheng Wang
Yujun Shen
Deli Zhao
Jinren Zhou
Tien-Ping Tan
DiffM
VGen
111
200
0
15 Mar 2023
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Wenhu Chen
Hexiang Hu
Chitwan Saharia
William W. Cohen
VLM
106
117
0
29 Sep 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via
  Transformers
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
235
328
0
29 May 2022
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
211
9,999
0
18 May 2015
1