ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.01422
  4. Cited By
DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes
v1v2v3v4v5 (latest)

DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes

3 March 2024
Zhende Song
Chenchen Wang
Jiamu Sheng
C. Zhang
Gang Yu
Jiayuan Fan
Tao Chen
    VGen
ArXiv (abs)PDFHTMLHuggingFace (30 upvotes)Github

Papers citing "DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes"

13 / 13 papers shown
Generative AI for Film Creation: A Survey of Recent Advances
Generative AI for Film Creation: A Survey of Recent Advances
Ruihan Zhang
Borou Yu
Jiajian Min
Yetong Xin
Zheng Wei
...
Sijia Jiang
Peiwen Huang
Na Chen
Xuanxuan Liu
Anyi Rao
VGen
314
27
0
11 Apr 2025
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
Minghan Wang
Ye Bai
Yanjie Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
408
2
0
31 Mar 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision TokenInternational Conference on Learning Representations (ICLR), 2025
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLMVLM
558
140
0
07 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
MLVU: Benchmarking Multi-task Long Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024
Yueze Wang
Yan Shu
Bo Zhao
Boya Wu
Junjie Zhou
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
664
11
0
03 Jan 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Yue Yu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
797
34
0
29 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
Kai Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
399
39
0
12 Dec 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on
  Comprehensive Long Video Understanding
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
354
26
0
27 Sep 2024
AutoDirector: Online Auto-scheduling Agents for Multi-sensory
  Composition
AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Minheng Ni
Chenfei Wu
Huaying Yuan
Zhengyuan Yang
Ming Gong
Lijuan Wang
Zicheng Liu
Wangmeng Zuo
Nan Duan
VGen
204
2
0
21 Aug 2024
Streaming Long Video Understanding with Large Language Models
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Yuan Liu
VLM
363
147
0
25 May 2024
Movie101v2: Improved Movie Narration Benchmark
Movie101v2: Improved Movie Narration Benchmark
Zihao Yue
Yepeng Zhang
Ziheng Wang
Qin Jin
VGen
341
4
0
20 Apr 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRMVLM
481
154
0
19 Feb 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
908
216
0
29 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
  Reasoning, and Planning
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningComputer Vision and Pattern Recognition (CVPR), 2023
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
MLLM
445
208
0
30 Nov 2023
1
Page 1 of 1