ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1605.03705
  4. Cited By
Movie Description

Movie Description

12 May 2016
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
    3DVVGen
ArXiv (abs)PDFHTML

Papers citing "Movie Description"

50 / 211 papers shown
Title
Text-Adaptive Multiple Visual Prototype Matching for Video-Text
  Retrieval
Text-Adaptive Multiple Visual Prototype Matching for Video-Text RetrievalNeural Information Processing Systems (NeurIPS), 2022
Che-Hsien Lin
Ancong Wu
Junwei Liang
Jun Zhang
Wenhang Ge
Wei Zheng
Chunhua Shen
201
37
0
27 Sep 2022
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video
  Temporal Grounding
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal GroundingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhijian Hou
Wanjun Zhong
Lei Ji
Difei Gao
Kun Yan
W. Chan
Chong-Wah Ngo
Zheng Shou
Nan Duan
AI4TS
193
33
0
22 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
Distribution Aware Metrics for Conditional Natural Language GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
274
4
0
15 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language
  Representation Alignment
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation AlignmentInternational Conference on Learning Representations (ICLR), 2022
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIPVLM
340
92
0
14 Sep 2022
Self-Contained Entity Discovery from Captioned Videos
Self-Contained Entity Discovery from Captioned Videos
M. Ayoughi
P. Mettes
Paul T. Groth
128
3
0
13 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-TokenizationEuropean Conference on Computer Vision (ECCV), 2022
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
201
20
0
01 Aug 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
TS2-Net: Token Shift and Selection Transformer for Text-Video RetrievalEuropean Conference on Computer Vision (ECCV), 2022
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
195
168
0
16 Jul 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
375
275
0
16 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2019
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
171
70
0
02 Jun 2022
On Advances in Text Generation from Images Beyond Captioning: A Case
  Study in Self-Rationalization
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-RationalizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
223
4
0
24 May 2022
Attract me to Buy: Advertisement Copywriting Generation with Multimodal
  Multi-structured Information
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information
Zhipeng Zhang
Xinglin Hou
K. Niu
Zhongzhen Huang
Bo Xiao
Yuning Jiang
Qi Wu
Peifeng Wang
120
5
0
07 May 2022
Brainish: Formalizing A Multimodal Language for Intelligence and
  Consciousness
Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness
Paul Pu Liang
282
6
0
14 Apr 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the routeComputer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
165
14
0
12 Apr 2022
Learning Audio-Video Modalities from Image Captions
Learning Audio-Video Modalities from Image CaptionsEuropean Conference on Computer Vision (ECCV), 2022
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
158
94
0
01 Apr 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
X-Pool: Cross-Modal Language-Video Attention for Text-Video RetrievalComputer Vision and Pattern Recognition (CVPR), 2022
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
Anthony L. Caterini
Animesh Garg
Guangwei Yu
183
223
0
28 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions
  from Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web VideosComputer Vision and Pattern Recognition (CVPR), 2022
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
190
43
0
22 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
  More Step Towards Generalization
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
138
18
0
14 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
150
7
0
12 Mar 2022
Synopses of Movie Narratives: a Video-Language Dataset for Story
  Understanding
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
342
11
0
11 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual
  Storytelling
Knowledge-enriched Attention Network with Group-wise Semantic for Visual StorytellingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
182
15
0
10 Mar 2022
Joint Answering and Explanation for Visual Commonsense Reasoning
Joint Answering and Explanation for Visual Commonsense ReasoningIEEE Transactions on Image Processing (IEEE TIP), 2022
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Yin-wei Wei
Liqiang Nie
Mohan S. Kankanhalli
218
25
0
25 Feb 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and SoundComputer Vision and Pattern Recognition (CVPR), 2022
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
400
236
0
07 Jan 2022
Cross Modal Retrieval with Querybank Normalisation
Cross Modal Retrieval with Querybank NormalisationComputer Vision and Pattern Recognition (CVPR), 2021
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
194
113
0
23 Dec 2021
Prompting Visual-Language Models for Efficient Video Understanding
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLMVLM
304
454
0
08 Dec 2021
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
228
26
0
02 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViTVLM
248
86
0
01 Dec 2021
MAD: A Scalable Dataset for Language Grounding in Videos from Movie
  Audio Descriptions
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Mattia Soldan
Alejandro Pardo
Juan Carlos León Alcázar
Fabian Caba Heilbron
Chen Zhao
Silvio Giancola
Guohao Li
VGen
281
127
0
01 Dec 2021
Advancing High-Resolution Video-Language Representation with Large-Scale
  Video Transcriptions
Advancing High-Resolution Video-Language Representation with Large-Scale Video TranscriptionsComputer Vision and Pattern Recognition (CVPR), 2021
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TSVLM
182
246
0
19 Nov 2021
Co-segmentation Inspired Attention Module for Video-based Computer
  Vision Tasks
Co-segmentation Inspired Attention Module for Video-based Computer Vision TasksComputer Vision and Image Understanding (CVIU), 2021
Arulkumar Subramaniam
Jayesh Vaidya
Muhammed Ameen
Athira M. Nambiar
Anurag Mittal
284
7
0
14 Nov 2021
Aesthetic Photo Collage with Deep Reinforcement Learning
Aesthetic Photo Collage with Deep Reinforcement Learning
Mingrui Zhang
Mading Li
Li Chen
Jiahao Yu
93
3
0
19 Oct 2021
More Than Reading Comprehension: A Survey on Datasets and Metrics of
  Textual Question Answering
More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering
Yang Bai
D. Wang
253
13
0
25 Sep 2021
MovieCuts: A New Dataset and Benchmark for Cut Type Recognition
MovieCuts: A New Dataset and Benchmark for Cut Type Recognition
Alejandro Pardo
Fabian Caba Heilbron
Juan Carlos León Alcázar
Ali K. Thabet
Guohao Li
VGen
272
32
0
12 Sep 2021
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual
  Softmax Loss
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
Xingyi Cheng
Hezheng Lin
Xiangyu Wu
Fan Yang
Dong Shen
221
167
0
09 Sep 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge ModelsNeural Information Processing Systems (NeurIPS), 2021
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLMLRM
252
423
0
04 Jun 2021
Premise-based Multimodal Reasoning: Conditional Inference on Joint
  Textual and Visual Clues
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual CluesAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Qingxiu Dong
Ziwei Qin
Heming Xia
Tian Feng
Shoujie Tong
...
Weidong Zhan
Sujian Li
Zhongyu Wei
Tianyu Liu
Zuifang Sui
LRM
156
14
0
15 May 2021
Conversational AI Systems for Social Good: Opportunities and Challenges
Conversational AI Systems for Social Good: Opportunities and Challenges
Peng Qi
Jing Huang
Youzheng Wu
Xiaodong He
Bowen Zhou
215
5
0
13 May 2021
Spoken Moments: Learning Joint Audio-Visual Representations from Video
  Descriptions
Spoken Moments: Learning Joint Audio-Visual Representations from Video DescriptionsComputer Vision and Pattern Recognition (CVPR), 2021
Mathew Monfort
SouYoung Jin
Alexander H. Liu
David Harwath
Rogerio Feris
James Glass
Aude Oliva
135
68
0
10 May 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore: A Reference-free Evaluation Metric for Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
779
2,180
0
18 Apr 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
TEACHTEXT: CrossModal Generalized Distillation for Text-Video RetrievalIEEE International Conference on Computer Vision (ICCV), 2021
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
148
140
0
16 Apr 2021
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding
  Evaluation Framework
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Santiago Castro
Ruoyao Wang
Pingxuan Huang
Ian Stewart
Oana Ignat
Nan Liu
Jonathan C. Stroud
Amélie Reymond
AIMat
219
12
0
09 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalIEEE International Conference on Computer Vision (ICCV), 2021
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
677
1,414
0
01 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
  Representation Learning
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
161
7
0
01 Apr 2021
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network
  for Video Reasoning over Traffic Events
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic EventsComputer Vision and Pattern Recognition (CVPR), 2021
Kepeng Xu
He Huang
Jun Liu
ViTLRM
293
109
0
29 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
151
147
0
19 Mar 2021
The Role of the Input in Natural Language Video Description
The Role of the Input in Natural Language Video DescriptionIEEE transactions on multimedia (TMM), 2020
S. Cascianelli
G. Costante
Alessandro Devo
Thomas Alessandro Ciarfuglia
P. Valigi
M. L. Fravolini
112
5
0
09 Feb 2021
Learning Temporal Dynamics from Cycles in Narrated Video
Learning Temporal Dynamics from Cycles in Narrated VideoIEEE International Conference on Computer Vision (ICCV), 2021
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
220
15
0
07 Jan 2021
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision
  and Language Research in Turkish
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in TurkishMachine Translation (MT), 2020
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
170
9
0
13 Dec 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient
  Updates and Internal Feature Distributions
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature DistributionsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Jianan Wang
Boyang Albert Li
Xiangyu Fan
Jing-Hua Lin
Yanwei Fu
114
3
0
15 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text RepresentationsComputer Vision and Pattern Recognition (CVPR), 2020
Linchao Zhu
Yi Yang
ViT
262
447
0
14 Nov 2020
What is More Likely to Happen Next? Video-and-Language Future Event
  Prediction
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
172
78
0
15 Oct 2020
Previous
12345
Next