Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.10072
Cited By
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
IEEE International Conference on Computer Vision (ICCV), 2019
27 August 2019
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Github (67★)
Papers citing
"Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network"
50 / 58 papers shown
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Yiming Ren
Zhiqiang Lin
Yu Li
Gao Meng
Weiyun Wang
...
Zicheng Lin
Jifeng Dai
Yujiu Yang
Wenhai Wang
Ruihang Chu
238
3
0
17 Jul 2025
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
Mingkai Tian
Guorong Li
Yuankai Qi
Amin Beheshti
Javen Qinfeng Shi
Anton van den Hengel
Qingming Huang
VGen
317
0
0
31 Mar 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Caihua Liu
Xu Li
Wenjing Xue
Wei Tang
Xia Feng
285
1
0
20 Feb 2025
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
277
1
0
11 Nov 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
248
0
0
22 Oct 2024
HOTVCOM: Generating Buzzworthy Comments for Videos
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuyan Chen
Yiwen Qian
Songzhou Yan
Jiyuan Jia
Zhixu Li
Yanghua Xiao
Xiaobo Li
Ming-Hsuan Yang
Qingpei Guo
283
9
0
23 Sep 2024
SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
Yiming Cui
Cheng Han
Dongfang Liu
333
1
0
29 May 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
337
35
0
26 Mar 2024
Subject-Oriented Video Captioning
Yunchuan Ma
Chang Teng
Yuankai Qi
Guorong Li
Laiyun Qing
Qi Wu
Qingming Huang
223
0
0
20 Dec 2023
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Yinan Han
Jing Wang
191
2
0
13 Aug 2023
A Review of Deep Learning for Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
255
47
0
22 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
491
172
0
17 Apr 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
243
10
0
26 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
260
79
0
22 Mar 2023
Neighborhood Contrastive Transformer for Change Captioning
IEEE transactions on multimedia (IEEE TMM), 2023
Yunbin Tu
Liang Li
Li Su
Kelvin Lu
Qin Huang
ViT
218
31
0
06 Mar 2023
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
AAAI Conference on Artificial Intelligence (AAAI), 2022
Zhuo Zhou
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffM
VGen
274
65
0
28 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
279
30
0
22 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
194
25
0
17 Nov 2022
Thinking Hallucination for Video Captioning
Asian Conference on Computer Vision (ACCV), 2022
Nasib Ullah
Partha Pratim Mohanta
VLM
229
10
0
28 Sep 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
685
748
0
27 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
301
69
0
22 May 2022
Support-set based Multi-modal Representation Enhancement for Video Captioning
IEEE International Conference on Multimedia and Expo (ICME), 2022
Xiaoya Chen
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Hengtao Shen
154
5
0
19 May 2022
Video Captioning: a comparative review of where we are and which could be the route
Computer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
260
17
0
12 Apr 2022
Learning Audio-Video Modalities from Image Captions
European Conference on Computer Vision (ECCV), 2022
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
243
98
0
01 Apr 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2022
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
338
191
0
20 Jan 2022
Image Captioning via Compact Bidirectional Architecture
Zijie Song
Yuanen Zhou
Zhenzhen Hu
Daqing Liu
Huixia Ben
Richang Hong
Meng Wang
VLM
255
18
0
06 Jan 2022
Variational Stacked Local Attention Networks for Diverse Video Captioning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tonmoay Deb
Akib Sadmanee
Kishor Kumar
Ahsan Ali
M. Ashraful
Mahbubur Rahman
253
10
0
04 Jan 2022
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
International Conference on Information Photonics (ICIP), 2021
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
177
3
0
28 Dec 2021
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
219
23
0
02 Dec 2021
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Yitian Yuan
Lin Ma
Wenwu Zhu
235
8
0
02 Dec 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2021
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
395
327
0
25 Nov 2021
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
259
95
0
24 Nov 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
297
4
0
19 Nov 2021
Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks
Computer Vision and Image Understanding (CVIU), 2021
Arulkumar Subramaniam
Jayesh Vaidya
Muhammed Ameen
Athira M. Nambiar
Anurag Mittal
430
7
0
14 Nov 2021
Visual-aware Attention Dual-stream Decoder for Video Captioning
Zhixin Sun
Zhuo Zhou
Shuqin Chen
Lin Li
Luo Zhong
223
4
0
16 Oct 2021
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
317
183
0
13 Oct 2021
Cross-Modal Graph with Meta Concepts for Video Captioning
IEEE Transactions on Image Processing (TIP), 2021
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
381
12
0
14 Aug 2021
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning
Findings (Findings), 2021
Fenglin Liu
Xuancheng Ren
Xian Wu
Bang-ju Yang
Shen Ge
Yuexian Zou
Xu Sun
306
38
0
05 Aug 2021
Boosting Video Captioning with Dynamic Loss Network
Nasib Ullah
Partha Pratim Mohanta
266
4
0
25 Jul 2021
A Comprehensive Review of the Video-to-Text Problem
Artificial Intelligence Review (AIR), 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
310
20
0
27 Mar 2021
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2021
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
282
65
0
24 Mar 2021
Open-book Video Captioning with Retrieve-Copy-Generate Network
Computer Vision and Pattern Recognition (CVPR), 2021
Ziqi Zhang
Chen Ma
Chun Yuan
Ying Shan
Bing Li
Ying Deng
Weiming Hu
166
115
0
09 Mar 2021
The MSR-Video to Text Dataset with Clean Annotations
Computer Vision and Image Understanding (CVIU), 2021
Haoran Chen
Jianmin Li
Simone Frintrop
Xiaolin Hu
284
18
0
12 Feb 2021
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
308
146
0
23 Nov 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
342
6
0
19 Oct 2020
Identity-Aware Multi-Sentence Video Description
J. S. Park
Trevor Darrell
Anna Rohrbach
222
22
0
22 Aug 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
European Conference on Computer Vision (ECCV), 2020
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
323
112
0
28 Jul 2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Tao Jin
Siyu Huang
Ming Chen
Yingming Li
Zhongfei Zhang
247
59
0
23 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
266
78
0
17 Jul 2020
Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions
European Conference on Computer Vision (ECCV), 2020
Noa Garcia
Yuta Nakashima
290
35
0
17 Jul 2020
1
2
Next
Page 1 of 2