Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1605.03705
Cited By
Movie Description
12 May 2016
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Movie Description"
50 / 211 papers shown
Title
More than a Moment: Towards Coherent Sequences of Audio Descriptions
Eshika Khandelwal
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Andrew Zisserman
Gül Varol
Makarand Tapaswi
DiffM
68
0
0
29 Oct 2025
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Zhenxin Lei
Zhangwei Gao
Changyao Tian
Erfei Cui
Guanzhou Chen
...
Xiangyu Zhao
Jiayi Ji
Yu Qiao
Wenhai Wang
Gen Luo
VLM
177
0
0
14 Oct 2025
What You See is What You Ask: Evaluating Audio Descriptions
Divy Kala
Eshika Khandelwal
Makarand Tapaswi
DiffM
90
1
0
01 Oct 2025
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
Nisarg A. Shah
Amir Ziai
Chaitanya Ekanadham
Vishal M. Patel
VGen
CoGe
ELM
105
0
0
17 Sep 2025
Video Understanding by Design: How Datasets Shape Architectures and Insights
Lei Wang
Piotr Koniusz
Yongsheng Gao
3DV
VGen
AI4TS
205
0
0
11 Sep 2025
Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi
S. Lee
Byungoh Ko
Eunseo Kim
Jihyung Kil
Hyunwoo J. Kim
148
0
0
01 Aug 2025
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko
Ji Soo Lee
M. Choi
Zihang Meng
Hyunwoo J. Kim
236
1
0
31 Jul 2025
Principled Multimodal Representation Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
175
6
0
23 Jul 2025
Can Vision Language Models Understand Mimed Actions?
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hyundong Justin Cho
Spencer Lin
Tejas Srinivasan
Michael Saxon
Deuksin Kwon
Natali T. Chavez
Jonathan May
140
3
0
17 Jun 2025
Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Linhao Yu
Xinguang Ji
Yahui Liu
Fanheng Kong
Chenxi Sun
Jingyuan Zhang
Hongzhi Zhang
Victoria A. Webster-Wood
Fuzheng Zhang
Deyi Xiong
153
2
0
11 Jun 2025
BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance
Huy Le
Nhat Chung
Tung Kieu
A. Nguyen
Ngan Le
312
1
0
04 Jun 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
AAAI Conference on Artificial Intelligence (AAAI), 2025
Baoyu Liang
Qile Su
Shoutai Zhu
Yuchen Liang
Chao Tong
VGen
187
2
0
03 Jun 2025
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
Quynh Phung
Long Mai
Fabian Caba Heilbron
Feng Liu
Jia-Bin Huang
Cusuh Ham
DiffM
VGen
CoGe
247
3
0
28 Apr 2025
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
Computer Vision and Pattern Recognition (CVPR), 2025
C. Kim
Jihwan Moon
Sangwoo Moon
Heeseung Yun
Sihaeng Lee
Aniruddha Kembhavi
Soonyoung Lee
Gunhee Kim
Sangho Lee
Christopher Clark
263
0
0
21 Apr 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi
Jiaheng Liu
Yushuo Guan
Zhikai Wu
Yujiao Shi
...
Bohan Zeng
Wei Zhang
Fuzheng Zhang
Wenjing Yang
Di Zhang
VGen
VLM
311
10
0
14 Apr 2025
Multimodal Lengthy Videos Retrieval Framework and Evaluation Metric
Mohamed Eltahir
Osamah Sarraj
Mohammed Bremoo
Mohammed Khurd
Abdulrahman Alfrihidi
Taha Alshatiri
Mohammad Almatrafi
Tanveer Hussain
124
1
0
06 Apr 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
311
2
0
21 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
228
4
0
17 Mar 2025
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Computer Vision and Pattern Recognition (CVPR), 2025
Zhifeng Xie
Qile He
Youjia Zhu
Qiwei He
Mengtian Li
VGen
304
2
0
11 Mar 2025
Audio-Language Datasets of Scenes and Events: A Survey
IEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
382
6
0
10 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
414
32
0
06 Jan 2025
Do Language Models Understand Time?
The Web Conference (WWW), 2024
Xi Ding
Lei Wang
692
9
0
18 Dec 2024
NowYouSee Me: Context-Aware Automatic Audio Description
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Seon-Ho Lee
Jue Wang
D. Fan
Zhikang Zhang
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
258
2
0
13 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yanjie Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
231
1
0
10 Dec 2024
Artificial Intelligence for Biomedical Video Generation
Linyuan Li
Jianing Qiu
Anujit Saha
Lin Li
Poyuan Li
Mengxian He
Ziyu Guo
Wu Yuan
VGen
334
3
0
12 Nov 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
178
4
0
11 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Computer Vision and Pattern Recognition (CVPR), 2024
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
321
64
0
10 Oct 2024
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Bingqing Zhang
Zhuo Cao
Heming Du
Xin Yu
Xue Li
Jiajun Liu
Sen Wang
VGen
154
5
0
30 Sep 2024
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama
Jing Tang
Quanlu Jia
Yuqiang Xie
Zeyu Gong
Xiang Wen
Jiayi Zhang
Yalong Guo
Guibin Chen
Jiangping Yang
VGen
197
2
0
18 Aug 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Yatian Wang
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
257
8
0
30 Jul 2024
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
162
17
0
22 Jul 2024
VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos
Devesh Walawalkar
Pablo Garrido
CVBM
156
0
0
16 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
271
111
0
30 Jun 2024
GUI Action Narrator: Where and When Did That Action Take Place?
Qinchen Wu
Difei Gao
Kevin Qinghong Lin
Zhuoyu Wu
Xiangwu Guo
Peiran Li
Weichen Zhang
Hengxu Wang
Mike Zheng Shou
183
5
0
19 Jun 2024
Multilingual Synopses of Movie Narratives: A Dataset for Story Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yidan Sun
Jianfei Yu
Boyang Li
194
0
0
18 Jun 2024
Long Story Short: Story-level Video Understanding from 20K Short Films
Ridouane Ghermi
Xi Wang
Vicky Kalogeiton
Ivan Laptev
VGen
108
2
0
14 Jun 2024
Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hui Liu
Wenya Wang
Hao Sun
Chris Xing Tian
Chenqi Kong
Xin Dong
Haoliang Li
138
10
0
14 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
172
1
0
13 Jun 2024
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Daniel A. P. Oliveira
Eugénio Ribeiro
David Martins de Matos
VGen
176
4
0
04 Jun 2024
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
295
60
0
26 May 2024
"Previously on ..." From Recaps to Story Summarization
Computer Vision and Pattern Recognition (CVPR), 2024
Aditya Kumar Singh
Dhruv Srivastava
Makarand Tapaswi
193
3
0
19 May 2024
MICap: A Unified Model for Identity-aware Movie Descriptions
Computer Vision and Pattern Recognition (CVPR), 2024
Haran Raajesh
Naveen Reddy Desanur
Zeeshan Khan
Makarand Tapaswi
208
7
0
19 May 2024
From Sora What We Can See: A Survey of Text-to-Video Generation
Rui Sun
Yumin Zhang
Tejal Shah
Jiahao Sun
Shuoying Zhang
Wenqi Li
Haoran Duan
Bo Wei
R. Ranjan
EGVM
227
37
0
17 May 2024
Learning Long-form Video Prior via Generative Pre-Training
Jinheng Xie
Jiajun Feng
Zhaoxu Tian
Kevin Qinghong Lin
Yawen Huang
...
Nanxu Gong
Xu Zuo
Jiaqi Yang
Yefeng Zheng
Mike Zheng Shou
147
8
0
24 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
263
33
0
22 Apr 2024
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning
Mingjie Ma
Zhihuan Yu
Yichao Ma
Guohui Li
LRM
162
2
0
22 Apr 2024
Movie101v2: Improved Movie Narration Benchmark
Zihao Yue
Yepeng Zhang
Ziheng Wang
Qin Jin
VGen
257
3
0
20 Apr 2024
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
Han Fang
Xianghao Zang
Chao Ban
Zerun Feng
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
254
3
0
18 Apr 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
227
36
0
20 Mar 2024
A Survey on Quality Metrics for Text-to-Image Generation
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2024
Sebastian Hartwig
Dominik Engel
Leon Sick
H. Kniesel
Tristan Payer
Poonam Poonam
Michael Glockler
Alex Bauerle
Timo Ropinski
EGVM
177
0
0
18 Mar 2024
1
2
3
4
5
Next