ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1605.03705
  4. Cited By
Movie Description

Movie Description

12 May 2016
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
    3DVVGen
ArXiv (abs)PDFHTML

Papers citing "Movie Description"

50 / 211 papers shown
Title
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video UnderstandingEuropean Conference on Computer Vision (ECCV), 2024
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
201
360
0
11 Mar 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGenVLM
570
78
0
20 Feb 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based
  Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
Xingning Dong
Zipeng Feng
Chunluan Zhou
Xuzheng Yu
Ming Yang
Qingpei Guo
VLM
202
5
0
31 Jan 2024
Visual Objectification in Films: Towards a New AI Task for Video
  Interpretation
Visual Objectification in Films: Towards a New AI Task for Video InterpretationComputer Vision and Pattern Recognition (CVPR), 2024
Julie Tores
L. Sassatelli
Hui-Yin Wu
Clement Bergman
Lea Andolfi
...
F. Precioso
Thierry Devars
Magali Guaresi
Virginie Julliard
Sarah Lecossais
175
5
0
24 Jan 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot
  Action Recognition
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
Jiaming Zhou
Junwei Liang
Kun-Yu Lin
Jinrui Yang
Wei-Shi Zheng
VLM
226
12
0
22 Jan 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLMCLIP
146
6
0
15 Jan 2024
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context LearningComputer Vision and Pattern Recognition (CVPR), 2023
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
210
48
0
29 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
205
4
0
25 Nov 2023
Long Story Short: a Summarize-then-Search Method for Long Video Question
  Answering
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung
Youngjae Yu
338
7
0
02 Nov 2023
A Survey on Video Diffusion Models
A Survey on Video Diffusion ModelsACM Computing Surveys (ACM Comput. Surv.), 2023
Zhen Xing
Qijun Feng
Haoran Chen
Jingdong Sun
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVMVGen
326
205
0
16 Oct 2023
Encoding and Decoding Narratives: Datafication and Alternative Access
  Models for Audiovisual Archives
Encoding and Decoding Narratives: Datafication and Alternative Access Models for Audiovisual ArchivesACM Multimedia (ACM MM), 2023
Yuchen Yang
152
1
0
10 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual
  Archives
Write What You Want: Applying Text-to-video Retrieval to Audiovisual ArchivesACM Journal on Computing and Cultural Heritage (JOCCH), 2023
Yuchen Yang
VGen
152
9
0
09 Oct 2023
Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses
Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses
Eslam Abdelaleem
I. Nemenman
K. M. Martini
402
7
0
05 Oct 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction
  Tuning
BT-Adapter: Video Conversation is Feasible Without Video Instruction TuningComputer Vision and Pattern Recognition (CVPR), 2023
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
171
39
0
27 Sep 2023
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Zuozhuo Dai
Fang Shao
Qingkun Su
Zilong Dong
Siyu Zhu
384
1
0
14 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
  and Generation
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and GenerationInternational Conference on Learning Representations (ICLR), 2023
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLMVGen
304
383
0
13 Jul 2023
PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television
  Dramas
PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas
Chen Li
Xutan Peng
Teng Wang
Yixiao Ge
Mengyang Liu
Xuyuan Xu
Yexin Wang
Ying Shan
VGen
152
2
0
26 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
COSA: Concatenated Sample Pretrained Vision-Language Foundation ModelInternational Conference on Learning Representations (ICLR), 2023
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLMCLIP
161
11
0
15 Jun 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic
  Understanding with Scene and Topic Transitions
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic TransitionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuxuan Wang
Zilong Zheng
Xueliang Zhao
Jinpeng Li
Yueqian Wang
Dongyan Zhao
VGen
149
13
0
30 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetNeural Information Processing Systems (NeurIPS), 2023
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
434
167
0
29 May 2023
Movie101: A New Movie Understanding Benchmark
Movie101: A New Movie Understanding BenchmarkAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zihao Yue
Tao Gui
Anwen Hu
Liang Zhang
Ziheng Wang
Qin Jin
VGen
201
25
0
20 May 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Paxion: Patching Action Knowledge in Video-Language Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023
Zhenhailong Wang
Ansel Blume
Sha Li
Genglin Liu
Jaemin Cho
Zineng Tang
Joey Tianyi Zhou
Heng Ji
KELMVGen
209
40
0
18 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text
  Retrieval
Mask to reconstruct: Cooperative Semantics Completion for Video-text RetrievalACM Multimedia (ACM MM), 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
192
5
0
13 May 2023
A Review of Deep Learning for Video Captioning
A Review of Deep Learning for Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
181
35
0
22 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
294
145
0
17 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
Verbs in Action: Improving verb understanding in video-language modelsIEEE International Conference on Computer Vision (ICCV), 2023
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
277
86
0
13 Apr 2023
How you feelin'? Learning Emotions and Mental States in Movie Scenes
How you feelin'? Learning Emotions and Mental States in Movie ScenesComputer Vision and Pattern Recognition (CVPR), 2023
D. Srivastava
A. Singh
Makarand Tapaswi
206
11
0
12 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Vita-CLIP: Video and text adaptive CLIP via Multimodal PromptingComputer Vision and Pattern Recognition (CVPR), 2023
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLMVPVLM
187
106
0
06 Apr 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in ContextComputer Vision and Pattern Recognition (CVPR), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
209
49
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
230
231
0
28 Mar 2023
Deep Learning for Video-Text Retrieval: a Review
Deep Learning for Video-Text Retrieval: a ReviewInternational Journal of Multimedia Information Retrieval (IJMIR), 2023
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
198
26
0
24 Feb 2023
Compositional Exemplars for In-context Learning
Compositional Exemplars for In-context LearningInternational Conference on Machine Learning (ICML), 2023
Jiacheng Ye
Zhiyong Wu
Jiangtao Feng
Tao Yu
Lingpeng Kong
266
162
0
11 Feb 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
Benchmarks for Automated Commonsense Reasoning: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023
E. Davis
ELMLRM
228
79
0
09 Feb 2023
Learning to Agree on Vision Attention for Visual Commonsense Reasoning
Learning to Agree on Vision Attention for Visual Commonsense ReasoningIEEE transactions on multimedia (IEEE TMM), 2023
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Fan Liu
Liqiang Nie
Mohan S. Kankanhalli
218
12
0
04 Feb 2023
Multimodal Event Transformer for Image-guided Story Ending Generation
Multimodal Event Transformer for Image-guided Story Ending GenerationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Yucheng Zhou
Guodong Long
186
24
0
26 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge
  Transferring
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge TransferringComputer Vision and Pattern Recognition (CVPR), 2023
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TSCLIPVLM
219
69
0
26 Jan 2023
Towards a Unified Model for Generating Answers and Explanations in
  Visual Question Answering
Towards a Unified Model for Generating Answers and Explanations in Visual Question AnsweringFindings (Findings), 2023
Chenxi Whitehouse
Tillman Weyde
Pranava Madhyastha
LRM
227
4
0
25 Jan 2023
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
NaQ: Leveraging Narrations as Queries to Supervise Episodic MemoryComputer Vision and Pattern Recognition (CVPR), 2023
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
325
46
0
02 Jan 2023
TeViS:Translating Text Synopses to Video Storyboards
TeViS:Translating Text Synopses to Video StoryboardsACM Multimedia (ACM MM), 2022
Xu Gu
Yuchong Sun
Feiyue Ni
Shizhe Chen
Xihua Wang
Ruihua Song
Yangqiu Song
Xiang Cao
DiffM
263
4
0
31 Dec 2022
MAViC: Multimodal Active Learning for Video Captioning
MAViC: Multimodal Active Learning for Video Captioning
Gyanendra Das
Xavier Thomas
Anant Raj
Vikram Gupta
123
3
0
11 Dec 2022
Learning Video Representations from Large Language Models
Learning Video Representations from Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLMAI4TS
232
225
0
08 Dec 2022
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level
  Natural Language Explanations
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
Björn Plüster
Jakob Ambsdorf
Lukas Braach
Jae Hee Lee
S. Wermter
178
6
0
08 Dec 2022
Understanding ME? Multimodal Evaluation for Fine-grained Visual
  Commonsense
Understanding ME? Multimodal Evaluation for Fine-grained Visual CommonsenseConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhecan Wang
Haoxuan You
Yicheng He
Wenhao Li
Kai-Wei Chang
Shih-Fu Chang
175
6
0
10 Nov 2022
Going for GOAL: A Resource for Grounded Football Commentaries
Going for GOAL: A Resource for Grounded Football Commentaries
Alessandro Suglia
José Lopes
E. Bastianelli
Andrea Vanzo
Shubham Agarwal
Malvina Nikandrou
Lu Yu
Ioannis Konstas
Verena Rieser
112
8
0
08 Nov 2022
Unsupervised Audio-Visual Lecture Segmentation
Unsupervised Audio-Visual Lecture SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Darshan Singh
Anchit Gupta
C. V. Jawahar
Makarand Tapaswi
VOS
193
7
0
29 Oct 2022
Learning Joint Representation of Human Motion and Language
Learning Joint Representation of Human Motion and Language
Jihoon Kim
Youngjae Yu
Seungyoung Shin
Taehyun Byun
Sungjoon Choi
154
5
0
27 Oct 2022
Grounded Video Situation Recognition
Grounded Video Situation RecognitionNeural Information Processing Systems (NeurIPS), 2022
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
174
15
0
19 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
VTC: Improving Video-Text Retrieval with User CommentsEuropean Conference on Computer Vision (ECCV), 2022
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
167
8
0
19 Oct 2022
Hierarchical3D Adapters for Long Video-to-text Summarization
Hierarchical3D Adapters for Long Video-to-text SummarizationFindings (Findings), 2022
Pinelopi Papalampidi
Mirella Lapata
VGen
173
14
0
10 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering
  via Decoupling Spatial-Temporal Modeling
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal ModelingBritish Machine Vision Conference (BMVC), 2022
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
169
2
0
08 Oct 2022
Previous
12345
Next