Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2411.07076
Cited By
v1
v2
v3 (latest)
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
11 November 2024
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification"
4 / 4 papers shown
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long
Yexiao He
Wentao Ye
Yiyuan Pan
Yuan Lin
Hang Li
Junbo Zhao
Wei Li
363
9
0
13 Aug 2025
From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding
Xiangfeng Wang
Xiao Li
Yadong Wei
Xueyu Song
Yang Song
...
Fangrui Zeng
Zaiyi Chen
Liu Liu
Gu Xu
Tong Xu
VGen
129
0
0
03 Jul 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
402
23
0
20 Apr 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xinlong Chen
Yuanxing Zhang
Chongling Rao
Yushuo Guan
Qingbin Liu
Fuzheng Zhang
Chengru Song
Qiang Liu
Di Zhang
Tieniu Tan
358
14
0
18 Feb 2025
1
Page 1 of 1