Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.10937
Cited By
MovieNet: A Holistic Dataset for Movie Understanding
21 July 2020
Qingqiu Huang
Yu Xiong
Anyi Rao
Jiaze Wang
Dahua Lin
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MovieNet: A Holistic Dataset for Movie Understanding"
33 / 33 papers shown
Title
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
Quynh Phung
Long Mai
Fabian Caba Heilbron
Feng Liu
Jia-Bin Huang
Cusuh Ham
DiffM
VGen
CoGe
101
0
0
28 Apr 2025
VEU-Bench: Towards Comprehensive Understanding of Video Editing
Bozheng Li
Y. Wu
Yi Lu
Jiashuo Yu
Licheng Tang
Jiawang Cao
Wenqing Zhu
Yuyang Sun
Jay Wu
Wenbo Zhu
34
0
0
24 Apr 2025
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin
Siyuan Cen
Daniel Jiang
Jay Karhade
Hewei Wang
...
Rushikesh Zawar
Xue Bai
Yilun Du
Chuang Gan
Deva Ramanan
VGen
23
0
0
21 Apr 2025
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
53
0
0
16 Apr 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
68
4
0
01 Mar 2025
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
114
1
0
22 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
83
2
0
11 Nov 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
74
54
0
19 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
41
5
0
31 Jul 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
43
3
0
24 May 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
47
4
0
19 Mar 2024
Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
Carlo Bretti
Pascal Mettes
Hendrik Vincent Koops
Daan Odijk
N. V. Noord
27
4
0
29 Jan 2024
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
36
259
0
28 Nov 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
19
36
0
10 Oct 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
16
2
0
09 Dec 2022
The One Where They Reconstructed 3D Humans and Environments in TV Shows
Georgios Pavlakos
Ethan Weber
Matthew Tancik
Angjoo Kanazawa
3DH
43
23
0
28 Jul 2022
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
Dawit Mureja Argaw
Fabian Caba Heilbron
Joon-Young Lee
Markus Woodson
In So Kweon
VGen
37
22
0
20 Jul 2022
Scene Consistency Representation Learning for Video Scene Segmentation
Haoqian Wu
Keyu Chen
Yanan Luo
Ruizhi Qiao
Bo Ren
Haozhe Liu
Weicheng Xie
Linlin Shen
SSL
31
16
0
11 May 2022
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
37
24
0
06 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
36
101
0
04 Apr 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
20
19
0
23 Mar 2022
Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension
Chao Zhao
Wenlin Yao
Dian Yu
Kaiqiang Song
Dong Yu
Jianshu Chen
15
4
0
19 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
39
14
0
01 Mar 2022
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Jonghwan Mun
Minchul Shin
Gunsoo Han
Sangho Lee
S. Ha
Joonseok Lee
Eun-Sol Kim
SSL
38
20
0
14 Jan 2022
Head and Body: Unified Detector and Graph Network for Person Search in Media
Xiujun Shu
Yusheng Tao
Ruizhi Qiao
Bo Ke
Wei Wen
Bo Ren
20
2
0
27 Nov 2021
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu
Yao Hu
S. Bai
Fei Ding
X. Bai
Philip H. S. Torr
36
81
0
17 Dec 2020
A Unified Framework for Shot Type Classification Based on Subject Centric Lens
Anyi Rao
Jiaze Wang
Linning Xu
Xuekun Jiang
Qingqiu Huang
Bolei Zhou
Dahua Lin
18
60
0
08 Aug 2020
Online Multi-modal Person Search in Videos
J. Xia
Anyi Rao
Qingqiu Huang
Linning Xu
Jiangtao Wen
Dahua Lin
20
28
0
08 Aug 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
17
100
0
08 May 2020
From Trailers to Storylines: An Efficient Way to Learn from Movies
Qingqiu Huang
Yuanjun Xiong
Yu Xiong
Yuqi Zhang
Dahua Lin
28
26
0
14 Jun 2018
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
261
10,196
0
16 Nov 2016
1