Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1412.4729
Cited By
v1
v2
v3 (latest)
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
North American Chapter of the Association for Computational Linguistics (NAACL), 2014
15 December 2014
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Translating Videos to Natural Language Using Deep Recurrent Neural Networks"
50 / 334 papers shown
Title
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
162
1
0
13 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
150
7
0
12 Mar 2022
Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems
Network and Distributed System Security Symposium (NDSS), 2022
H. Abdullah
Aditya Karlekar
S. Prasad
Muhammad Sajidur Rahman
Logan Blue
L. A. Bauer
Vincent Bindschaedler
Patrick Traynor
AAML
119
4
0
10 Mar 2022
Exploiting long-term temporal dynamics for video captioning
World wide web (Bussum) (WWW), 2018
Yuyu Guo
Jingqiu Zhang
Lianli Gao
110
18
0
22 Feb 2022
Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation
Ahmad Hammoudeh
Bastein Vanderplaetse
Stéphane Dupont
ViT
186
8
0
11 Feb 2022
Variational Stacked Local Attention Networks for Diverse Video Captioning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tonmoay Deb
Akib Sadmanee
Kishor Kumar
Ahsan Ali
M. Ashraful
Mahbubur Rahman
101
10
0
04 Jan 2022
Human-AI Collaboration for UX Evaluation: Effects of Explanation and Synchronization
Mingming Fan
Xianyou Yang
Tsz Tung Yu
Vera Q. Liao
J. Zhao
86
1
0
23 Dec 2021
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Computer Vision and Pattern Recognition (CVPR), 2021
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
Guosheng Lin
264
213
0
17 Dec 2021
Dense Video Captioning Using Unsupervised Semantic Information
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
172
10
0
15 Dec 2021
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
228
26
0
02 Dec 2021
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
139
21
0
02 Dec 2021
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Yitian Yuan
Lin Ma
Wenwu Zhu
148
7
0
02 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViT
VLM
248
86
0
01 Dec 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
108
24
0
30 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2021
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
238
292
0
25 Nov 2021
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
210
87
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Computer Vision and Pattern Recognition (CVPR), 2021
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
182
246
0
19 Nov 2021
Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks
Computer Vision and Image Understanding (CVIU), 2021
Arulkumar Subramaniam
Jayesh Vaidya
Muhammed Ameen
Athira M. Nambiar
Anurag Mittal
284
7
0
14 Nov 2021
Video and Text Matching with Conditioned Embeddings
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Ameen Ali
Idan Schwartz
Tamir Hazan
Lior Wolf
238
15
0
21 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
653
676
0
28 Sep 2021
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation
Yanjun Gao
Lulu Liu
Jason Wang
Xin Chen
Huayan Wang
Rui Zhang
135
1
0
10 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
ACM Multimedia (ACM MM), 2021
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
192
14
0
07 Sep 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
166
36
0
18 Aug 2021
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
203
218
0
17 Aug 2021
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning
Findings (Findings), 2021
Fenglin Liu
Xuancheng Ren
Xian Wu
Bang-ju Yang
Shen Ge
Yuexian Zou
Xu Sun
187
36
0
05 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Interspeech (Interspeech), 2021
Chiori Hori
Takaaki Hori
Jonathan Le Roux
90
4
0
04 Aug 2021
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
Information Fusion (Inf. Fusion), 2021
Anil Rahate
Rahee Walambe
S. Ramanna
K. Kotecha
305
169
0
29 Jul 2021
Transcript to Video: Efficient Clip Sequencing from Texts
ACM Multimedia (ACM MM), 2021
Yu Xiong
Fabian Caba Heilbron
Dahua Lin
CLIP
185
13
0
25 Jul 2021
Boosting Video Captioning with Dynamic Loss Network
Nasib Ullah
Partha Pratim Mohanta
158
4
0
25 Jul 2021
Contrastive Attention for Automatic Chest X-ray Report Generation
Findings (Findings), 2021
Fenglin Liu
Changchang Yin
Xian Wu
Shen Ge
Yuexian Zou
Ping Zhang
Yuexian Zou
Xu Sun
MedIm
182
186
0
13 Jun 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
1.3K
979
0
18 Apr 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2021
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
136
140
0
16 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
IEEE International Conference on Computer Vision (ICCV), 2021
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
657
1,414
0
01 Apr 2021
A Comprehensive Review of the Video-to-Text Problem
Artificial Intelligence Review (AIR), 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
209
18
0
27 Mar 2021
PGT: A Progressive Method for Training Models on Long Videos
Computer Vision and Pattern Recognition (CVPR), 2021
Bo Pang
Gao Peng
Yizhuo Li
Cewu Lu
VLM
102
13
0
21 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
International Journal of Computer Vision (IJCV), 2021
Andrew Shin
Masato Ishii
T. Narihira
209
46
0
06 Mar 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Computer Vision and Pattern Recognition (CVPR), 2021
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
360
741
0
11 Feb 2021
The Role of the Input in Natural Language Video Description
IEEE transactions on multimedia (TMM), 2020
S. Cascianelli
G. Costante
Alessandro Devo
Thomas Alessandro Ciarfuglia
P. Valigi
M. L. Fravolini
112
5
0
09 Feb 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
IEEE International Conference on Computer Vision (ICCV), 2021
Ruilong Li
Sha Yang
David A. Ross
Angjoo Kanazawa
ViT
619
614
0
21 Jan 2021
Video Captioning in Compressed Video
International Conference on Image, Vision and Computing (ICIVC), 2021
Mingjian Zhu
Chenrui Duan
Changbin (Brad) Yu
91
5
0
02 Jan 2021
Searching a Raw Video Database using Natural Language Queries
Sriram Krishna
Siddarth Vinay
S. SrinivasK.
65
0
0
31 Dec 2020
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
Computer Speech and Language (CSL), 2020
Jing Su
Qingyun Dai
Frank Guerin
Mian Zhou
142
28
0
03 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
165
5
0
30 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
153
3
0
18 Nov 2020
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Bowen Zhang
Hexiang Hu
Joonseok Lee
Mingde Zhao
Sheide Chammas
Vihan Jain
Eugene Ie
Fei Sha
155
39
0
18 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Neural Information Processing Systems (NeurIPS), 2020
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
168
178
0
01 Nov 2020
Personalized Multimodal Feedback Generation in Education
International Conference on Computational Linguistics (COLING), 2020
Haochen Liu
Zitao Liu
Zhongqin Wu
Shucheng Zhou
108
13
0
31 Oct 2020
Improved Actor Relation Graph based Group Activity Recognition
International Conference on Smart Multimedia (ICSM), 2020
Zijian Kuang
Xinran Tie
71
5
0
24 Oct 2020
Video Captioning Using Weak Annotation
Jingyi Hou
Yunde Jia
Xinxiao Wu
Yayun Qi
115
2
0
02 Sep 2020
Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning
Knowledge Discovery and Data Mining (KDD), 2020
Yinghua Zhang
Yangqiu Song
Jian Liang
Kun Bai
Qiang Yang
AAML
109
30
0
25 Aug 2020
Previous
1
2
3
4
5
6
7
Next