Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1605.03705
Cited By
Movie Description
12 May 2016
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Movie Description"
50 / 211 papers shown
Title
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
Neural Information Processing Systems (NeurIPS), 2022
Che-Hsien Lin
Ancong Wu
Junwei Liang
Jun Zhang
Wenhang Ge
Wei Zheng
Chunhua Shen
201
37
0
27 Sep 2022
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhijian Hou
Wanjun Zhong
Lei Ji
Difei Gao
Kun Yan
W. Chan
Chong-Wah Ngo
Zheng Shou
Nan Duan
AI4TS
193
33
0
22 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
International Conference on Language Resources and Evaluation (LREC), 2022
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
274
4
0
15 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
International Conference on Learning Representations (ICLR), 2022
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIP
VLM
340
92
0
14 Sep 2022
Self-Contained Entity Discovery from Captioned Videos
M. Ayoughi
P. Mettes
Paul T. Groth
128
3
0
13 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
European Conference on Computer Vision (ECCV), 2022
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
201
20
0
01 Aug 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
European Conference on Computer Vision (ECCV), 2022
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
195
168
0
16 Jul 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Neural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
375
275
0
16 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
AAAI Conference on Artificial Intelligence (AAAI), 2019
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
171
70
0
02 Jun 2022
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
223
4
0
24 May 2022
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information
Zhipeng Zhang
Xinglin Hou
K. Niu
Zhongzhen Huang
Bo Xiao
Yuning Jiang
Qi Wu
Peifeng Wang
120
5
0
07 May 2022
Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness
Paul Pu Liang
282
6
0
14 Apr 2022
Video Captioning: a comparative review of where we are and which could be the route
Computer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
165
14
0
12 Apr 2022
Learning Audio-Video Modalities from Image Captions
European Conference on Computer Vision (ECCV), 2022
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
158
94
0
01 Apr 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
Computer Vision and Pattern Recognition (CVPR), 2022
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
Anthony L. Caterini
Animesh Garg
Guangwei Yu
183
223
0
28 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Computer Vision and Pattern Recognition (CVPR), 2022
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
190
43
0
22 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
138
18
0
14 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
150
7
0
12 Mar 2022
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
342
11
0
11 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
182
15
0
10 Mar 2022
Joint Answering and Explanation for Visual Commonsense Reasoning
IEEE Transactions on Image Processing (IEEE TIP), 2022
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Yin-wei Wei
Liqiang Nie
Mohan S. Kankanhalli
218
25
0
25 Feb 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Computer Vision and Pattern Recognition (CVPR), 2022
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
400
236
0
07 Jan 2022
Cross Modal Retrieval with Querybank Normalisation
Computer Vision and Pattern Recognition (CVPR), 2021
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
194
113
0
23 Dec 2021
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLM
VLM
304
454
0
08 Dec 2021
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
228
26
0
02 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViT
VLM
248
86
0
01 Dec 2021
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Mattia Soldan
Alejandro Pardo
Juan Carlos León Alcázar
Fabian Caba Heilbron
Chen Zhao
Silvio Giancola
Guohao Li
VGen
281
127
0
01 Dec 2021
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Computer Vision and Pattern Recognition (CVPR), 2021
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
182
246
0
19 Nov 2021
Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks
Computer Vision and Image Understanding (CVIU), 2021
Arulkumar Subramaniam
Jayesh Vaidya
Muhammed Ameen
Athira M. Nambiar
Anurag Mittal
284
7
0
14 Nov 2021
Aesthetic Photo Collage with Deep Reinforcement Learning
Mingrui Zhang
Mading Li
Li Chen
Jiahao Yu
93
3
0
19 Oct 2021
More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering
Yang Bai
D. Wang
253
13
0
25 Sep 2021
MovieCuts: A New Dataset and Benchmark for Cut Type Recognition
Alejandro Pardo
Fabian Caba Heilbron
Juan Carlos León Alcázar
Ali K. Thabet
Guohao Li
VGen
272
32
0
12 Sep 2021
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
Xingyi Cheng
Hezheng Lin
Xiangyu Wu
Fan Yang
Dong Shen
221
167
0
09 Sep 2021
MERLOT: Multimodal Neural Script Knowledge Models
Neural Information Processing Systems (NeurIPS), 2021
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
252
423
0
04 Jun 2021
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Qingxiu Dong
Ziwei Qin
Heming Xia
Tian Feng
Shoujie Tong
...
Weidong Zhan
Sujian Li
Zhongyu Wei
Tianyu Liu
Zuifang Sui
LRM
156
14
0
15 May 2021
Conversational AI Systems for Social Good: Opportunities and Challenges
Peng Qi
Jing Huang
Youzheng Wu
Xiaodong He
Bowen Zhou
215
5
0
13 May 2021
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Computer Vision and Pattern Recognition (CVPR), 2021
Mathew Monfort
SouYoung Jin
Alexander H. Liu
David Harwath
Rogerio Feris
James Glass
Aude Oliva
135
68
0
10 May 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
779
2,180
0
18 Apr 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2021
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
148
140
0
16 Apr 2021
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Santiago Castro
Ruoyao Wang
Pingxuan Huang
Ian Stewart
Oana Ignat
Nan Liu
Jonathan C. Stroud
Amélie Reymond
AIMat
219
12
0
09 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
IEEE International Conference on Computer Vision (ICCV), 2021
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
677
1,414
0
01 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
161
7
0
01 Apr 2021
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Computer Vision and Pattern Recognition (CVPR), 2021
Kepeng Xu
He Huang
Jun Liu
ViT
LRM
293
109
0
29 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
151
147
0
19 Mar 2021
The Role of the Input in Natural Language Video Description
IEEE transactions on multimedia (TMM), 2020
S. Cascianelli
G. Costante
Alessandro Devo
Thomas Alessandro Ciarfuglia
P. Valigi
M. L. Fravolini
112
5
0
09 Feb 2021
Learning Temporal Dynamics from Cycles in Narrated Video
IEEE International Conference on Computer Vision (ICCV), 2021
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
220
15
0
07 Jan 2021
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Machine Translation (MT), 2020
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
170
9
0
13 Dec 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Jianan Wang
Boyang Albert Li
Xiangyu Fan
Jing-Hua Lin
Yanwei Fu
114
3
0
15 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
Computer Vision and Pattern Recognition (CVPR), 2020
Linchao Zhu
Yi Yang
ViT
262
447
0
14 Nov 2020
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
172
78
0
15 Oct 2020
Previous
1
2
3
4
5
Next