Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1412.4729
Cited By
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
15 December 2014
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Translating Videos to Natural Language Using Deep Recurrent Neural Networks"
50 / 333 papers shown
Title
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval
Xiaolun Jing
Genke Yang
Jian Chu
26
0
0
07 Apr 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
Caihua Liu
Xu Li
Wenjing Xue
Wei Tang
Xia Feng
51
0
0
20 Feb 2025
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
31
0
0
11 Nov 2024
A Survey on Integrated Sensing, Communication, and Computation
Dingzhu Wen
Yong Zhou
Xiaoyang Li
Yuanming Shi
Kaibin Huang
Khaled B. Letaief
29
24
0
15 Aug 2024
VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It
Xiaoxuan Zhu
Zhouhong Gu
Sihang Jiang
Zhixu Li
Hongwei Feng
Yanghua Xiao
21
0
0
15 Jun 2024
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
36
0
26 May 2024
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
Xiaolun Jing
Genke Yang
Jian Chu
CLIP
29
1
0
25 May 2024
MICap: A Unified Model for Identity-aware Movie Descriptions
Haran Raajesh
Naveen Reddy Desanur
Zeeshan Khan
Makarand Tapaswi
30
4
0
19 May 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
26
4
0
19 Apr 2024
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis
Maged Shoman
Dongdong Wang
Armstrong Aboah
Mohamed Abdel-Aty
32
13
0
12 Apr 2024
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim
Hyeon Bae Kim
Jinyoung Moon
Jinwoo Choi
Seong Tae Kim
36
17
0
11 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
44
1
0
01 Apr 2024
Cross-Modal Reasoning with Event Correlation for Video Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Qinru Qiu
Jian Tang
35
0
0
20 Dec 2023
Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)
Kabita Parajuli
S. R. Joshi
20
0
0
12 Dec 2023
Multi Sentence Description of Complex Manipulation Action Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
F. Worgotter
25
1
0
13 Nov 2023
CLearViD: Curriculum Learning for Video Description
Cheng-Yu Chuang
Pooyan Fazli
30
1
0
08 Nov 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Yimu Wang
Xiangru Jian
Bo Xue
22
9
0
17 Oct 2023
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
Jiayuan Mao
Xuelin Yang
Xikun Zhang
Noah D. Goodman
Jiajun Wu
NAI
22
22
0
05 Oct 2023
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning
Hao Wang
Libo Zhang
Hengrui Fan
Tiejian Luo
24
6
0
18 Sep 2023
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Bin Liu
Jing Wang
30
1
0
13 Aug 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Wen-tau Yih
VGen
26
3
0
04 May 2023
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
J. Guo
Xueqi Cheng
LRM
53
1
0
03 May 2023
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Erik Cambria
Fatih Porikli
3DV
27
20
0
22 Apr 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
19
31
0
28 Mar 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
30
49
0
25 Mar 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David A. Clifton
25
8
0
11 Mar 2023
Collaboration with Conversational AI Assistants for UX Evaluation: Questions and How to Ask them (Voice vs. Text)
Emily Kuang
Ehsan Jahangirzadeh Soure
Mingming Fan
J. Zhao
Kristen Shinohara
ELM
33
22
0
07 Mar 2023
TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Modality
Yinsong Wang
Shahin Shahrampour
13
0
0
04 Feb 2023
ADAPT: Action-aware Driving Caption Transformer
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
25
69
0
01 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Yizhen Chen
Jie Wang
Lijian Lin
Zhongang Qi
Jin Ma
Ying Shan
VLM
18
18
0
30 Jan 2023
METEOR Guided Divergence for Video Captioning
D. Rothenpieler
Shahin Amiriparian
21
3
0
20 Dec 2022
MAViC: Multimodal Active Learning for Video Captioning
Gyanendra Das
Xavier Thomas
Anant Raj
Vikram Gupta
16
2
0
11 Dec 2022
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
X. Zhong
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffM
VGen
19
40
0
28 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
19
23
0
22 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
29
16
0
17 Nov 2022
Contrastive Video-Language Learning with Fine-grained Frame Sampling
Zixu Wang
Yujie Zhong
Yishu Miao
Lin Ma
Lucia Specia
46
11
0
10 Oct 2022
Thinking Hallucination for Video Captioning
Nasib Ullah
Partha Pratim Mohanta
VLM
36
4
0
28 Sep 2022
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
Che-Hsien Lin
Ancong Wu
Junwei Liang
Jun Zhang
Wenhang Ge
Wei Zheng
Chunhua Shen
87
21
0
27 Sep 2022
LGDN: Language-Guided Denoising Network for Video-Language Modeling
Haoyu Lu
Mingyu Ding
Nanyi Fei
Yuqi Huo
Zhiwu Lu
VLM
74
16
0
23 Sep 2022
Large-Scale Traffic Congestion Prediction based on Multimodal Fusion and Representation Mapping
Bo Zhou
Jiahui Liu
Songyi Cui
Yaping Zhao
18
5
0
23 Aug 2022
Diverse Video Captioning by Adaptive Spatio-temporal Attention
Zohreh Ghaderi
Leonard Salewski
Hendrik P. A. Lensch
13
8
0
19 Aug 2022
Sports Video Analysis on Large-Scale Data
Dekun Wu
Henghui Zhao
Xingce Bao
Richard P. Wildes
21
13
0
09 Aug 2022
LocVTP: Video-Text Pre-training for Temporal Localization
Meng Cao
Tianyu Yang
Junwu Weng
Can Zhang
Jue Wang
Yuexian Zou
16
64
0
21 Jul 2022
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Ming Yan
Ji Zhang
Rongrong Ji
CLIP
VLM
10
268
0
15 Jul 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
20
81
0
14 Jun 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
29
527
0
27 May 2022
Visual Abductive Reasoning
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
26
38
0
26 Mar 2022
Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval
Guanyu Cai
Yixiao Ge
Binjie Zhang
Alex Jinpeng Wang
Rui Yan
...
Ying Shan
Lianghua He
Xiaohu Qie
Jianping Wu
Mike Zheng Shou
VLM
10
6
0
15 Mar 2022
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
17
1
0
13 Mar 2022
1
2
3
4
5
6
7
Next