v1v2v3 (latest)

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

11 November 2024

Papers citing "StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification"

4 / 4 papers shown

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

363

13 Aug 2025

From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding

...

129

03 Jul 2025

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark

402

20 Apr 2025

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

358

18 Feb 2025