ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.10072
  4. Cited By
Controllable Video Captioning with POS Sequence Guidance Based on Gated
  Fusion Network

Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network

IEEE International Conference on Computer Vision (ICCV), 2019
27 August 2019
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
ArXiv (abs)PDFHTMLGithub (67★)

Papers citing "Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network"

50 / 58 papers shown
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Yiming Ren
Zhiqiang Lin
Yu Li
Gao Meng
Weiyun Wang
...
Zicheng Lin
Jifeng Dai
Yujiu Yang
Wenhai Wang
Ruihang Chu
238
3
0
17 Jul 2025
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
Mingkai Tian
Guorong Li
Yuankai Qi
Amin Beheshti
Javen Qinfeng Shi
Anton van den Hengel
Qingming Huang
VGen
317
0
0
31 Mar 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Caihua Liu
Xu Li
Wenjing Xue
Wei Tang
Xia Feng
285
1
0
20 Feb 2025
Multi-Modal interpretable automatic video captioning
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
277
1
0
11 Nov 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
248
0
0
22 Oct 2024
HOTVCOM: Generating Buzzworthy Comments for Videos
HOTVCOM: Generating Buzzworthy Comments for VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuyan Chen
Yiwen Qian
Songzhou Yan
Jiyuan Jia
Zhixu Li
Yanghua Xiao
Xiaobo Li
Ming-Hsuan Yang
Qingpei Guo
283
9
0
23 Sep 2024
SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for
  Autonomous Driving
SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
Yiming Cui
Cheng Han
Dongfang Liu
333
1
0
29 May 2024
OmniVid: A Generative Framework for Universal Video Understanding
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLMVGen
337
35
0
26 Mar 2024
Subject-Oriented Video Captioning
Subject-Oriented Video Captioning
Yunchuan Ma
Chang Teng
Yuankai Qi
Guorong Li
Laiyun Qing
Qi Wu
Qingming Huang
223
0
0
20 Dec 2023
Video Captioning with Aggregated Features Based on Dual Graphs and Gated
  Fusion
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Yinan Han
Jing Wang
191
2
0
13 Aug 2023
A Review of Deep Learning for Video Captioning
A Review of Deep Learning for Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
255
47
0
22 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
491
172
0
17 Apr 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
243
10
0
26 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Text with Knowledge Graph Augmented Transformer for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2023
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
260
79
0
22 Mar 2023
Neighborhood Contrastive Transformer for Change Captioning
Neighborhood Contrastive Transformer for Change CaptioningIEEE transactions on multimedia (IEEE TMM), 2023
Yunbin Tu
Liang Li
Li Su
Kelvin Lu
Qin Huang
ViT
218
31
0
06 Mar 2023
Refined Semantic Enhancement towards Frequency Diffusion for Video
  Captioning
Refined Semantic Enhancement towards Frequency Diffusion for Video CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2022
Zhuo Zhou
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffMVGen
274
65
0
28 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video
  Captioning
Aligning Source Visual and Target Language Domains for Unpaired Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
279
30
0
22 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video CaptioningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
194
25
0
17 Nov 2022
Thinking Hallucination for Video Captioning
Thinking Hallucination for Video CaptioningAsian Conference on Computer Vision (ACCV), 2022
Nasib Ullah
Partha Pratim Mohanta
VLM
229
10
0
28 Sep 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
685
748
0
27 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
GL-RG: Global-Local Representation Granularity for Video CaptioningInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
301
69
0
22 May 2022
Support-set based Multi-modal Representation Enhancement for Video
  Captioning
Support-set based Multi-modal Representation Enhancement for Video CaptioningIEEE International Conference on Multimedia and Expo (ICME), 2022
Xiaoya Chen
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Hengtao Shen
154
5
0
19 May 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the routeComputer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
260
17
0
12 Apr 2022
Learning Audio-Video Modalities from Image Captions
Learning Audio-Video Modalities from Image CaptionsEuropean Conference on Computer Vision (ECCV), 2022
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
243
98
0
01 Apr 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
End-to-end Generative Pretraining for Multimodal Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2022
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
338
191
0
20 Jan 2022
Image Captioning via Compact Bidirectional Architecture
Image Captioning via Compact Bidirectional Architecture
Zijie Song
Yuanen Zhou
Zhenzhen Hu
Daqing Liu
Huixia Ben
Richang Hong
Meng Wang
VLM
255
18
0
06 Jan 2022
Variational Stacked Local Attention Networks for Diverse Video
  Captioning
Variational Stacked Local Attention Networks for Diverse Video CaptioningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tonmoay Deb
Akib Sadmanee
Kishor Kumar
Ahsan Ali
M. Ashraful
Mahbubur Rahman
253
10
0
04 Jan 2022
Synchronized Audio-Visual Frames with Fractional Positional Encoding for
  Transformers in Video-to-Text Translation
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text TranslationInternational Conference on Information Photonics (ICIP), 2021
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
177
3
0
28 Dec 2021
Controllable Video Captioning with an Exemplar Sentence
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
219
23
0
02 Dec 2021
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Yitian Yuan
Lin Ma
Wenwu Zhu
235
8
0
02 Dec 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2021
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
395
327
0
25 Nov 2021
Hierarchical Modular Network for Video Captioning
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
259
95
0
24 Nov 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
297
4
0
19 Nov 2021
Co-segmentation Inspired Attention Module for Video-based Computer
  Vision Tasks
Co-segmentation Inspired Attention Module for Video-based Computer Vision TasksComputer Vision and Image Understanding (CVIU), 2021
Arulkumar Subramaniam
Jayesh Vaidya
Muhammed Ameen
Athira M. Nambiar
Anurag Mittal
430
7
0
14 Nov 2021
Visual-aware Attention Dual-stream Decoder for Video Captioning
Visual-aware Attention Dual-stream Decoder for Video Captioning
Zhixin Sun
Zhuo Zhou
Shuqin Chen
Lin Li
Luo Zhong
223
4
0
16 Oct 2021
CLIP4Caption: CLIP for Video Caption
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIPVLM
317
183
0
13 Oct 2021
Cross-Modal Graph with Meta Concepts for Video Captioning
Cross-Modal Graph with Meta Concepts for Video CaptioningIEEE Transactions on Image Processing (TIP), 2021
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
381
12
0
14 Aug 2021
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable
  Video Captioning
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video CaptioningFindings (Findings), 2021
Fenglin Liu
Xuancheng Ren
Xian Wu
Bang-ju Yang
Shen Ge
Yuexian Zou
Xu Sun
306
38
0
05 Aug 2021
Boosting Video Captioning with Dynamic Loss Network
Boosting Video Captioning with Dynamic Loss Network
Nasib Ullah
Partha Pratim Mohanta
266
4
0
25 Jul 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text ProblemArtificial Intelligence Review (AIR), 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
310
20
0
27 Mar 2021
Relation-aware Instance Refinement for Weakly Supervised Visual
  Grounding
Relation-aware Instance Refinement for Weakly Supervised Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2021
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
282
65
0
24 Mar 2021
Open-book Video Captioning with Retrieve-Copy-Generate Network
Open-book Video Captioning with Retrieve-Copy-Generate NetworkComputer Vision and Pattern Recognition (CVPR), 2021
Ziqi Zhang
Chen Ma
Chun Yuan
Ying Shan
Bing Li
Ying Deng
Weiming Hu
166
115
0
09 Mar 2021
The MSR-Video to Text Dataset with Clean Annotations
The MSR-Video to Text Dataset with Clean AnnotationsComputer Vision and Image Understanding (CVIU), 2021
Haoran Chen
Jianmin Li
Simone Frintrop
Xiaolin Hu
284
18
0
12 Feb 2021
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization
  Tasks
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
308
146
0
23 Nov 2020
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
342
6
0
19 Oct 2020
Identity-Aware Multi-Sentence Video Description
Identity-Aware Multi-Sentence Video Description
J. S. Park
Trevor Darrell
Anna Rohrbach
222
22
0
22 Aug 2020
Learning Modality Interaction for Temporal Sentence Localization and
  Event Captioning in Videos
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in VideosEuropean Conference on Computer Vision (ECCV), 2020
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
323
112
0
28 Jul 2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
SBAT: Video Captioning with Sparse Boundary-Aware TransformerInternational Joint Conference on Artificial Intelligence (IJCAI), 2020
Tao Jin
Siyu Huang
Ming Chen
Yingming Li
Zhongfei Zhang
247
59
0
23 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video
  Captioning
Learning to Discretely Compose Reasoning Module Networks for Video CaptioningInternational Joint Conference on Artificial Intelligence (IJCAI), 2020
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
266
78
0
17 Jul 2020
Knowledge-Based Video Question Answering with Unsupervised Scene
  Descriptions
Knowledge-Based Video Question Answering with Unsupervised Scene DescriptionsEuropean Conference on Computer Vision (ECCV), 2020
Noa Garcia
Yuta Nakashima
290
35
0
17 Jul 2020
12
Next
Page 1 of 2