ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16899
  4. Cited By
AutoAD: Movie Description in Context

AutoAD: Movie Description in Context

Computer Vision and Pattern Recognition (CVPR), 2023
29 March 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
    VGen
ArXiv (abs)PDFHTML

Papers citing "AutoAD: Movie Description in Context"

38 / 38 papers shown
Title
HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
Joungbin An
Kristen Grauman
Mamba
201
0
0
27 Oct 2025
Addressing the ID-Matching Challenge in Long Video Captioning
Addressing the ID-Matching Challenge in Long Video Captioning
Zhantao Yang
Huangji Wang
Ruili Feng
Han Zhang
Yuting Hu
Shangwen Zhu
Junyan Li
Yu Liu
Fan Cheng
72
0
0
08 Oct 2025
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
Nisarg A. Shah
Amir Ziai
Chaitanya Ekanadham
Vishal M. Patel
VGenCoGeELM
101
0
0
17 Sep 2025
ResidualViT for Efficient Temporally Dense Video Encoding
ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan
Fabian Caba Heilbron
Bernard Ghanem
Josef Sivic
Bryan C. Russell
137
0
0
16 Sep 2025
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long
Yexiao He
Wentao Ye
Yiyuan Pan
Yuan Lin
Hang Li
Junbo Zhao
Wei Li
198
7
0
13 Aug 2025
SV3.3B: A Sports Video Understanding Model for Action Recognition
SV3.3B: A Sports Video Understanding Model for Action Recognition
Sai Varun Kodathala
Yashwanth Reddy Vutukoori
Rakesh Vunnam
192
2
0
23 Jul 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffMVGen
335
3
0
01 Apr 2025
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMsComputer Vision and Pattern Recognition (CVPR), 2025
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
222
1
0
31 Mar 2025
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement LearningInternational Symposium on Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt), 2025
Yubo Zhang
Pedro Botelho
Trevor Gordon
Gil Zussman
I. Kadota
221
1
0
31 Mar 2025
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living
Ramanathan Rajendiran
Debaditya Roy
Basura Fernando
VGen
269
0
0
03 Mar 2025
NowYouSee Me: Context-Aware Automatic Audio Description
NowYouSee Me: Context-Aware Automatic Audio DescriptionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Seon-Ho Lee
Jue Wang
D. Fan
Zhikang Zhang
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
258
2
0
13 Dec 2024
Progress-Aware Video Frame Captioning
Progress-Aware Video Frame CaptioningComputer Vision and Pattern Recognition (CVPR), 2024
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
528
5
0
03 Dec 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
MovieBench: A Hierarchical Movie Level Dataset for Long Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffMVGen
367
11
0
22 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGenVLM
654
4
0
11 Nov 2024
It's Just Another Day: Unique Video Captioning by Discriminative
  Prompting
It's Just Another Day: Unique Video Captioning by Discriminative PromptingAsian Conference on Computer Vision (ACCV), 2024
Toby Perrett
Tengda Han
Dima Damen
Andrew Zisserman
188
3
0
15 Oct 2024
Character-aware audio-visual subtitling in context
Character-aware audio-visual subtitling in contextAsian Conference on Computer Vision (ACCV), 2024
Jaesung Huh
Andrew Zisserman
256
0
0
14 Oct 2024
Generating Event-oriented Attribution for Movies via Two-Stage
  Prefix-Enhanced Multimodal LLM
Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM
Yuanjie Lyu
Tong Xu
Zihan Niu
Bo Peng
Jing Ke
Enhong Chen
189
0
0
14 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
211
11
0
31 Jul 2024
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
162
17
0
22 Jul 2024
Long Story Short: Story-level Video Understanding from 20K Short Films
Long Story Short: Story-level Video Understanding from 20K Short Films
Ridouane Ghermi
Xi Wang
Vicky Kalogeiton
Ivan Laptev
VGen
104
2
0
14 Jun 2024
Multi-layer Learnable Attention Mask for Multimodal Tasks
Multi-layer Learnable Attention Mask for Multimodal Tasks
Wayner Barrios
SouYoung Jin
143
1
0
04 Jun 2024
MICap: A Unified Model for Identity-aware Movie Descriptions
MICap: A Unified Model for Identity-aware Movie DescriptionsComputer Vision and Pattern Recognition (CVPR), 2024
Haran Raajesh
Naveen Reddy Desanur
Zeeshan Khan
Makarand Tapaswi
208
7
0
19 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
217
87
0
14 May 2024
LLM-AD: Large Language Model based Audio Description System
LLM-AD: Large Language Model based Audio Description System
Peng Chu
Jiang Wang
Andre Abrantes
136
10
0
02 May 2024
Learning Long-form Video Prior via Generative Pre-Training
Learning Long-form Video Prior via Generative Pre-Training
Jinheng Xie
Jiajun Feng
Zhaoxu Tian
Kevin Qinghong Lin
Yawen Huang
...
Nanxu Gong
Xu Zuo
Jiaqi Yang
Yefeng Zheng
Mike Zheng Shou
139
8
0
24 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGenDiffM
259
33
0
22 Apr 2024
Streaming Dense Video Captioning
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
221
72
0
01 Apr 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Contextual AD Narration with Interleaved Multimodal SequenceComputer Vision and Pattern Recognition (CVPR), 2024
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
374
7
0
19 Mar 2024
Visual Objectification in Films: Towards a New AI Task for Video
  Interpretation
Visual Objectification in Films: Towards a New AI Task for Video InterpretationComputer Vision and Pattern Recognition (CVPR), 2024
Julie Tores
L. Sassatelli
Hui-Yin Wu
Clement Bergman
Lea Andolfi
...
F. Precioso
Thierry Devars
Magali Guaresi
Virginie Julliard
Sarah Lecossais
183
5
0
24 Jan 2024
Video Summarization: Towards Entity-Aware Captions
Video Summarization: Towards Entity-Aware Captions
Hammad A. Ayyubi
Tianqi Liu
Arsha Nagrani
Xudong Lin
Ruotong Wang
Anurag Arnab
Feng Han
Yukun Zhu
Jialu Liu
Shih-Fu Chang
112
1
0
01 Dec 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context LearningComputer Vision and Pattern Recognition (CVPR), 2023
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
214
48
0
29 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio
  context keywords
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
143
14
0
14 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
195
84
0
30 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
AutoAD II: The Sequel -- Who, When, and What in Movie Audio DescriptionIEEE International Conference on Computer Vision (ICCV), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGenDiffM
186
48
0
10 Oct 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLMVLM
205
7
0
27 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
A Large-scale Dataset for Audio-Language Representation LearningACM Multimedia (ACM MM), 2023
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
291
44
0
20 Sep 2023
Synopses of Movie Narratives: a Video-Language Dataset for Story
  Understanding
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
342
11
0
11 Mar 2022
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
CLIP-Adapter: Better Vision-Language Models with Feature AdaptersInternational Journal of Computer Vision (IJCV), 2021
Shiyang Feng
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Zelong Li
Jiaming Song
Yu Qiao
VLMCLIP
1.0K
1,385
0
09 Oct 2021
1