ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.4729
  4. Cited By
Translating Videos to Natural Language Using Deep Recurrent Neural
  Networks
v1v2v3 (latest)

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

North American Chapter of the Association for Computational Linguistics (NAACL), 2014
15 December 2014
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
ArXiv (abs)PDFHTML

Papers citing "Translating Videos to Natural Language Using Deep Recurrent Neural Networks"

50 / 334 papers shown
Title
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
162
1
0
13 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
150
7
0
12 Mar 2022
Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on
  Automatic Speech Recognition Systems
Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition SystemsNetwork and Distributed System Security Symposium (NDSS), 2022
H. Abdullah
Aditya Karlekar
S. Prasad
Muhammad Sajidur Rahman
Logan Blue
L. A. Bauer
Vincent Bindschaedler
Patrick Traynor
AAML
119
4
0
10 Mar 2022
Exploiting long-term temporal dynamics for video captioning
Exploiting long-term temporal dynamics for video captioningWorld wide web (Bussum) (WWW), 2018
Yuyu Guo
Jingqiu Zhang
Lianli Gao
110
18
0
22 Feb 2022
Deep soccer captioning with transformer: dataset, semantics-related
  losses, and multi-level evaluation
Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation
Ahmad Hammoudeh
Bastein Vanderplaetse
Stéphane Dupont
ViT
186
8
0
11 Feb 2022
Variational Stacked Local Attention Networks for Diverse Video
  Captioning
Variational Stacked Local Attention Networks for Diverse Video CaptioningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tonmoay Deb
Akib Sadmanee
Kishor Kumar
Ahsan Ali
M. Ashraful
Mahbubur Rahman
101
10
0
04 Jan 2022
Human-AI Collaboration for UX Evaluation: Effects of Explanation and
  Synchronization
Human-AI Collaboration for UX Evaluation: Effects of Explanation and Synchronization
Mingming Fan
Xianyou Yang
Tsz Tung Yu
Vera Q. Liao
J. Zhao
86
1
0
23 Dec 2021
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Align and Prompt: Video-and-Language Pre-training with Entity PromptsComputer Vision and Pattern Recognition (CVPR), 2021
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
Guosheng Lin
264
213
0
17 Dec 2021
Dense Video Captioning Using Unsupervised Semantic Information
Dense Video Captioning Using Unsupervised Semantic Information
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
172
10
0
15 Dec 2021
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
228
26
0
02 Dec 2021
Controllable Video Captioning with an Exemplar Sentence
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
139
21
0
02 Dec 2021
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Yitian Yuan
Lin Ma
Wenwu Zhu
148
7
0
02 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViTVLM
248
86
0
01 Dec 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does
  Matter
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
108
24
0
30 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2021
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
238
292
0
25 Nov 2021
Hierarchical Modular Network for Video Captioning
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
210
87
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale
  Video Transcriptions
Advancing High-Resolution Video-Language Representation with Large-Scale Video TranscriptionsComputer Vision and Pattern Recognition (CVPR), 2021
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TSVLM
182
246
0
19 Nov 2021
Co-segmentation Inspired Attention Module for Video-based Computer
  Vision Tasks
Co-segmentation Inspired Attention Module for Video-based Computer Vision TasksComputer Vision and Image Understanding (CVIU), 2021
Arulkumar Subramaniam
Jayesh Vaidya
Muhammed Ameen
Athira M. Nambiar
Anurag Mittal
284
7
0
14 Nov 2021
Video and Text Matching with Conditioned Embeddings
Video and Text Matching with Conditioned EmbeddingsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Ameen Ali
Idan Schwartz
Tamir Hazan
Lior Wolf
238
15
0
21 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIPVLM
653
676
0
28 Sep 2021
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery
  Generation
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation
Yanjun Gao
Lulu Liu
Jason Wang
Xin Chen
Huayan Wang
Rui Zhang
135
1
0
10 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal
  Attention
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal AttentionACM Multimedia (ACM MM), 2021
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
192
14
0
07 Sep 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
166
36
0
18 Aug 2021
End-to-End Dense Video Captioning with Parallel Decoding
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
203
218
0
17 Aug 2021
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable
  Video Captioning
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video CaptioningFindings (Findings), 2021
Fenglin Liu
Xuancheng Ren
Xian Wu
Bang-ju Yang
Shen Ge
Yuexian Zou
Xu Sun
187
36
0
05 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual
  Transformers
Optimizing Latency for Online Video CaptioningUsing Audio-Visual TransformersInterspeech (Interspeech), 2021
Chiori Hori
Takaaki Hori
Jonathan Le Roux
90
4
0
04 Aug 2021
Multimodal Co-learning: Challenges, Applications with Datasets, Recent
  Advances and Future Directions
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future DirectionsInformation Fusion (Inf. Fusion), 2021
Anil Rahate
Rahee Walambe
S. Ramanna
K. Kotecha
305
169
0
29 Jul 2021
Transcript to Video: Efficient Clip Sequencing from Texts
Transcript to Video: Efficient Clip Sequencing from TextsACM Multimedia (ACM MM), 2021
Yu Xiong
Fabian Caba Heilbron
Dahua Lin
CLIP
185
13
0
25 Jul 2021
Boosting Video Captioning with Dynamic Loss Network
Boosting Video Captioning with Dynamic Loss Network
Nasib Ullah
Partha Pratim Mohanta
158
4
0
25 Jul 2021
Contrastive Attention for Automatic Chest X-ray Report Generation
Contrastive Attention for Automatic Chest X-ray Report GenerationFindings (Findings), 2021
Fenglin Liu
Changchang Yin
Xian Wu
Shen Ge
Yuexian Zou
Ping Zhang
Yuexian Zou
Xu Sun
MedIm
182
186
0
13 Jun 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIPVLM
1.3K
979
0
18 Apr 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
TEACHTEXT: CrossModal Generalized Distillation for Text-Video RetrievalIEEE International Conference on Computer Vision (ICCV), 2021
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
136
140
0
16 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalIEEE International Conference on Computer Vision (ICCV), 2021
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
657
1,414
0
01 Apr 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text ProblemArtificial Intelligence Review (AIR), 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
209
18
0
27 Mar 2021
PGT: A Progressive Method for Training Models on Long Videos
PGT: A Progressive Method for Training Models on Long VideosComputer Vision and Pattern Recognition (CVPR), 2021
Bo Pang
Gao Peng
Yizhuo Li
Cewu Lu
VLM
102
13
0
21 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and VisionInternational Journal of Computer Vision (IJCV), 2021
Andrew Shin
Masato Ishii
T. Narihira
209
46
0
06 Mar 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse SamplingComputer Vision and Pattern Recognition (CVPR), 2021
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
360
741
0
11 Feb 2021
The Role of the Input in Natural Language Video Description
The Role of the Input in Natural Language Video DescriptionIEEE transactions on multimedia (TMM), 2020
S. Cascianelli
G. Costante
Alessandro Devo
Thomas Alessandro Ciarfuglia
P. Valigi
M. L. Fravolini
112
5
0
09 Feb 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++IEEE International Conference on Computer Vision (ICCV), 2021
Ruilong Li
Sha Yang
David A. Ross
Angjoo Kanazawa
ViT
619
614
0
21 Jan 2021
Video Captioning in Compressed Video
Video Captioning in Compressed VideoInternational Conference on Image, Vision and Computing (ICIVC), 2021
Mingjian Zhu
Chenrui Duan
Changbin (Brad) Yu
91
5
0
02 Jan 2021
Searching a Raw Video Database using Natural Language Queries
Searching a Raw Video Database using Natural Language Queries
Sriram Krishna
Siddarth Vinay
S. SrinivasK.
65
0
0
31 Dec 2020
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual StorytellingComputer Speech and Language (CSL), 2020
Jing Su
Qingyun Dai
Frank Guerin
Mian Zhou
142
28
0
03 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video
  Description
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DVVLM
165
5
0
30 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for
  Leveraging Inductive Biases for Vision and Language
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
153
3
0
18 Nov 2020
A Hierarchical Multi-Modal Encoder for Moment Localization in Video
  Corpus
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Bowen Zhang
Hexiang Hu
Joonseok Lee
Mingde Zhao
Sheide Chammas
Vihan Jain
Eugene Ie
Fei Sha
155
39
0
18 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation
  Learning
COOT: Cooperative Hierarchical Transformer for Video-Text Representation LearningNeural Information Processing Systems (NeurIPS), 2020
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViTCLIP
168
178
0
01 Nov 2020
Personalized Multimodal Feedback Generation in Education
Personalized Multimodal Feedback Generation in EducationInternational Conference on Computational Linguistics (COLING), 2020
Haochen Liu
Zitao Liu
Zhongqin Wu
Shucheng Zhou
108
13
0
31 Oct 2020
Improved Actor Relation Graph based Group Activity Recognition
Improved Actor Relation Graph based Group Activity RecognitionInternational Conference on Smart Multimedia (ICSM), 2020
Zijian Kuang
Xinran Tie
71
5
0
24 Oct 2020
Video Captioning Using Weak Annotation
Video Captioning Using Weak Annotation
Jingyi Hou
Yunde Jia
Xinxiao Wu
Yayun Qi
115
2
0
02 Sep 2020
Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
  Learning
Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer LearningKnowledge Discovery and Data Mining (KDD), 2020
Yinghua Zhang
Yangqiu Song
Jian Liang
Kun Bai
Qiang Yang
AAML
109
30
0
25 Aug 2020
Previous
1234567
Next