v1v2v3v4v5 (latest)

DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes

3 March 2024

ArXiv (abs)PDF HTML HuggingFace (30 upvotes)Github

Papers citing "DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes"

13 / 13 papers shown

Generative AI for Film Creation: A Survey of Recent Advances

...

314

11 Apr 2025

SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development

408

31 Mar 2025

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision TokenInternational Conference on Learning Representations (ICLR), 2025

558

140

07 Jan 2025

MLVU: Benchmarking Multi-task Long Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024

...

664

03 Jan 2025

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

797

29 Dec 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

...

399

12 Dec 2024

From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding

...

Huaijian Zhang

354

27 Sep 2024

AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Zhengyuan Yang

Wangmeng Zuo

204

21 Aug 2024

Streaming Long Video Understanding with Large Language Models

Dahua Lin

363

147

25 May 2024

Movie101v2: Improved Movie Narration Benchmark

Qin Jin

341

20 Apr 2024

The Revolution of Multimodal Large Language Models: A Survey

Lorenzo Baraldi

481

154

19 Feb 2024

Video Understanding with Large Language Models: A Survey

...

908

216

29 Dec 2023

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningComputer Vision and Pattern Recognition (CVPR), 2023

445

208

30 Nov 2023