Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2303.16899
Cited By
AutoAD: Movie Description in Context
Computer Vision and Pattern Recognition (CVPR), 2023
29 March 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AutoAD: Movie Description in Context"
38 / 38 papers shown
Title
HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
Joungbin An
Kristen Grauman
Mamba
201
0
0
27 Oct 2025
Addressing the ID-Matching Challenge in Long Video Captioning
Zhantao Yang
Huangji Wang
Ruili Feng
Han Zhang
Yuting Hu
Shangwen Zhu
Junyan Li
Yu Liu
Fan Cheng
72
0
0
08 Oct 2025
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
Nisarg A. Shah
Amir Ziai
Chaitanya Ekanadham
Vishal M. Patel
VGen
CoGe
ELM
101
0
0
17 Sep 2025
ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan
Fabian Caba Heilbron
Bernard Ghanem
Josef Sivic
Bryan C. Russell
137
0
0
16 Sep 2025
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long
Yexiao He
Wentao Ye
Yiyuan Pan
Yuan Lin
Hang Li
Junbo Zhao
Wei Li
198
7
0
13 Aug 2025
SV3.3B: A Sports Video Understanding Model for Action Recognition
Sai Varun Kodathala
Yashwanth Reddy Vutukoori
Rakesh Vunnam
192
2
0
23 Jul 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffM
VGen
335
3
0
01 Apr 2025
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Computer Vision and Pattern Recognition (CVPR), 2025
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
222
1
0
31 Mar 2025
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
International Symposium on Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt), 2025
Yubo Zhang
Pedro Botelho
Trevor Gordon
Gil Zussman
I. Kadota
221
1
0
31 Mar 2025
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living
Ramanathan Rajendiran
Debaditya Roy
Basura Fernando
VGen
269
0
0
03 Mar 2025
NowYouSee Me: Context-Aware Automatic Audio Description
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Seon-Ho Lee
Jue Wang
D. Fan
Zhikang Zhang
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
258
2
0
13 Dec 2024
Progress-Aware Video Frame Captioning
Computer Vision and Pattern Recognition (CVPR), 2024
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
528
5
0
03 Dec 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Computer Vision and Pattern Recognition (CVPR), 2024
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
367
11
0
22 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
654
4
0
11 Nov 2024
It's Just Another Day: Unique Video Captioning by Discriminative Prompting
Asian Conference on Computer Vision (ACCV), 2024
Toby Perrett
Tengda Han
Dima Damen
Andrew Zisserman
188
3
0
15 Oct 2024
Character-aware audio-visual subtitling in context
Asian Conference on Computer Vision (ACCV), 2024
Jaesung Huh
Andrew Zisserman
256
0
0
14 Oct 2024
Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM
Yuanjie Lyu
Tong Xu
Zihan Niu
Bo Peng
Jing Ke
Enhong Chen
189
0
0
14 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
211
11
0
31 Jul 2024
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
162
17
0
22 Jul 2024
Long Story Short: Story-level Video Understanding from 20K Short Films
Ridouane Ghermi
Xi Wang
Vicky Kalogeiton
Ivan Laptev
VGen
104
2
0
14 Jun 2024
Multi-layer Learnable Attention Mask for Multimodal Tasks
Wayner Barrios
SouYoung Jin
143
1
0
04 Jun 2024
MICap: A Unified Model for Identity-aware Movie Descriptions
Computer Vision and Pattern Recognition (CVPR), 2024
Haran Raajesh
Naveen Reddy Desanur
Zeeshan Khan
Makarand Tapaswi
208
7
0
19 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
217
87
0
14 May 2024
LLM-AD: Large Language Model based Audio Description System
Peng Chu
Jiang Wang
Andre Abrantes
136
10
0
02 May 2024
Learning Long-form Video Prior via Generative Pre-Training
Jinheng Xie
Jiajun Feng
Zhaoxu Tian
Kevin Qinghong Lin
Yawen Huang
...
Nanxu Gong
Xu Zuo
Jiaqi Yang
Yefeng Zheng
Mike Zheng Shou
139
8
0
24 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
259
33
0
22 Apr 2024
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
221
72
0
01 Apr 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Computer Vision and Pattern Recognition (CVPR), 2024
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
374
7
0
19 Mar 2024
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Computer Vision and Pattern Recognition (CVPR), 2024
Julie Tores
L. Sassatelli
Hui-Yin Wu
Clement Bergman
Lea Andolfi
...
F. Precioso
Thierry Devars
Magali Guaresi
Virginie Julliard
Sarah Lecossais
183
5
0
24 Jan 2024
Video Summarization: Towards Entity-Aware Captions
Hammad A. Ayyubi
Tianqi Liu
Arsha Nagrani
Xudong Lin
Ruotong Wang
Anurag Arnab
Feng Han
Yukun Zhu
Jialu Liu
Shih-Fu Chang
112
1
0
01 Dec 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
214
48
0
29 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
143
14
0
14 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
195
84
0
30 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
IEEE International Conference on Computer Vision (ICCV), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
186
48
0
10 Oct 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
205
7
0
27 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
ACM Multimedia (ACM MM), 2023
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
291
44
0
20 Sep 2023
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
342
11
0
11 Mar 2022
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
International Journal of Computer Vision (IJCV), 2021
Shiyang Feng
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Zelong Li
Jiaming Song
Yu Qiao
VLM
CLIP
1.0K
1,385
0
09 Oct 2021
1