Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1605.03705
Cited By
Movie Description
12 May 2016
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Movie Description"
50 / 211 papers shown
Title
VideoMamba: State Space Model for Efficient Video Understanding
European Conference on Computer Vision (ECCV), 2024
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
201
360
0
11 Mar 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
570
78
0
20 Feb 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
Xingning Dong
Zipeng Feng
Chunluan Zhou
Xuzheng Yu
Ming Yang
Qingpei Guo
VLM
202
5
0
31 Jan 2024
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Computer Vision and Pattern Recognition (CVPR), 2024
Julie Tores
L. Sassatelli
Hui-Yin Wu
Clement Bergman
Lea Andolfi
...
F. Precioso
Thierry Devars
Magali Guaresi
Virginie Julliard
Sarah Lecossais
175
5
0
24 Jan 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
Jiaming Zhou
Junwei Liang
Kun-Yu Lin
Jinrui Yang
Wei-Shi Zheng
VLM
226
12
0
22 Jan 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
146
6
0
15 Jan 2024
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
210
48
0
29 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
205
4
0
25 Nov 2023
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung
Youngjae Yu
338
7
0
02 Nov 2023
A Survey on Video Diffusion Models
ACM Computing Surveys (ACM Comput. Surv.), 2023
Zhen Xing
Qijun Feng
Haoran Chen
Jingdong Sun
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVM
VGen
326
205
0
16 Oct 2023
Encoding and Decoding Narratives: Datafication and Alternative Access Models for Audiovisual Archives
ACM Multimedia (ACM MM), 2023
Yuchen Yang
152
1
0
10 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives
ACM Journal on Computing and Cultural Heritage (JOCCH), 2023
Yuchen Yang
VGen
152
9
0
09 Oct 2023
Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses
Eslam Abdelaleem
I. Nemenman
K. M. Martini
402
7
0
05 Oct 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
171
39
0
27 Sep 2023
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Zuozhuo Dai
Fang Shao
Qingkun Su
Zilong Dong
Siyu Zhu
384
1
0
14 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
International Conference on Learning Representations (ICLR), 2023
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
304
383
0
13 Jul 2023
PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas
Chen Li
Xutan Peng
Teng Wang
Yixiao Ge
Mengyang Liu
Xuyuan Xu
Yexin Wang
Ying Shan
VGen
152
2
0
26 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
International Conference on Learning Representations (ICLR), 2023
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLM
CLIP
161
11
0
15 Jun 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuxuan Wang
Zilong Zheng
Xueliang Zhao
Jinpeng Li
Yueqian Wang
Dongyan Zhao
VGen
149
13
0
30 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Neural Information Processing Systems (NeurIPS), 2023
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
434
167
0
29 May 2023
Movie101: A New Movie Understanding Benchmark
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zihao Yue
Tao Gui
Anwen Hu
Liang Zhang
Ziheng Wang
Qin Jin
VGen
201
25
0
20 May 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Neural Information Processing Systems (NeurIPS), 2023
Zhenhailong Wang
Ansel Blume
Sha Li
Genglin Liu
Jaemin Cho
Zineng Tang
Joey Tianyi Zhou
Heng Ji
KELM
VGen
209
40
0
18 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
ACM Multimedia (ACM MM), 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
192
5
0
13 May 2023
A Review of Deep Learning for Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
181
35
0
22 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
294
145
0
17 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
IEEE International Conference on Computer Vision (ICCV), 2023
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
277
86
0
13 Apr 2023
How you feelin'? Learning Emotions and Mental States in Movie Scenes
Computer Vision and Pattern Recognition (CVPR), 2023
D. Srivastava
A. Singh
Makarand Tapaswi
206
11
0
12 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Computer Vision and Pattern Recognition (CVPR), 2023
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
187
106
0
06 Apr 2023
AutoAD: Movie Description in Context
Computer Vision and Pattern Recognition (CVPR), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
209
49
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
IEEE International Conference on Computer Vision (ICCV), 2023
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
230
231
0
28 Mar 2023
Deep Learning for Video-Text Retrieval: a Review
International Journal of Multimedia Information Retrieval (IJMIR), 2023
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
198
26
0
24 Feb 2023
Compositional Exemplars for In-context Learning
International Conference on Machine Learning (ICML), 2023
Jiacheng Ye
Zhiyong Wu
Jiangtao Feng
Tao Yu
Lingpeng Kong
266
162
0
11 Feb 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
E. Davis
ELM
LRM
228
79
0
09 Feb 2023
Learning to Agree on Vision Attention for Visual Commonsense Reasoning
IEEE transactions on multimedia (IEEE TMM), 2023
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Fan Liu
Liqiang Nie
Mohan S. Kankanhalli
218
12
0
04 Feb 2023
Multimodal Event Transformer for Image-guided Story Ending Generation
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Yucheng Zhou
Guodong Long
186
24
0
26 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
Computer Vision and Pattern Recognition (CVPR), 2023
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TS
CLIP
VLM
219
69
0
26 Jan 2023
Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering
Findings (Findings), 2023
Chenxi Whitehouse
Tillman Weyde
Pranava Madhyastha
LRM
227
4
0
25 Jan 2023
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
Computer Vision and Pattern Recognition (CVPR), 2023
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
325
46
0
02 Jan 2023
TeViS:Translating Text Synopses to Video Storyboards
ACM Multimedia (ACM MM), 2022
Xu Gu
Yuchong Sun
Feiyue Ni
Shizhe Chen
Xihua Wang
Ruihua Song
Yangqiu Song
Xiang Cao
DiffM
263
4
0
31 Dec 2022
MAViC: Multimodal Active Learning for Video Captioning
Gyanendra Das
Xavier Thomas
Anant Raj
Vikram Gupta
123
3
0
11 Dec 2022
Learning Video Representations from Large Language Models
Computer Vision and Pattern Recognition (CVPR), 2022
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLM
AI4TS
232
225
0
08 Dec 2022
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
Björn Plüster
Jakob Ambsdorf
Lukas Braach
Jae Hee Lee
S. Wermter
178
6
0
08 Dec 2022
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhecan Wang
Haoxuan You
Yicheng He
Wenhao Li
Kai-Wei Chang
Shih-Fu Chang
175
6
0
10 Nov 2022
Going for GOAL: A Resource for Grounded Football Commentaries
Alessandro Suglia
José Lopes
E. Bastianelli
Andrea Vanzo
Shubham Agarwal
Malvina Nikandrou
Lu Yu
Ioannis Konstas
Verena Rieser
112
8
0
08 Nov 2022
Unsupervised Audio-Visual Lecture Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Darshan Singh
Anchit Gupta
C. V. Jawahar
Makarand Tapaswi
VOS
193
7
0
29 Oct 2022
Learning Joint Representation of Human Motion and Language
Jihoon Kim
Youngjae Yu
Seungyoung Shin
Taehyun Byun
Sungjoon Choi
154
5
0
27 Oct 2022
Grounded Video Situation Recognition
Neural Information Processing Systems (NeurIPS), 2022
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
174
15
0
19 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
European Conference on Computer Vision (ECCV), 2022
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
167
8
0
19 Oct 2022
Hierarchical3D Adapters for Long Video-to-text Summarization
Findings (Findings), 2022
Pinelopi Papalampidi
Mirella Lapata
VGen
173
14
0
10 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
British Machine Vision Conference (BMVC), 2022
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
169
2
0
08 Oct 2022
Previous
1
2
3
4
5
Next