ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.07675
  4. Cited By
Video Captioning with Transferred Semantic Attributes

Video Captioning with Transferred Semantic Attributes

23 November 2016
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
ArXiv (abs)PDFHTML

Papers citing "Video Captioning with Transferred Semantic Attributes"

50 / 115 papers shown
Title
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
Jinjing Gu
Tianbao Qin
Yuanyuan Pu
Zhengpeng Zhao
VLM
76
0
0
10 Aug 2025
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMsComputer Vision and Pattern Recognition (CVPR), 2025
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
230
1
0
31 Mar 2025
EgoLife: Towards Egocentric Life AssistantComputer Vision and Pattern Recognition (CVPR), 2025
Jingkang Yang
Shuai Liu
Hongming Guo
Yuhao Dong
Xinyu Zhang
...
Joerg Widmer
Francesco Gringoli
Lei Yang
Bo Li
Ziwei Liu
EgoV
214
28
0
05 Mar 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
343
4
0
31 Dec 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGenVLM
606
78
0
20 Feb 2024
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Yifan Lu
Ziqi Zhang
Chunfen Yuan
Peng Li
Yan Wang
Bing Li
Weiming Hu
113
6
0
25 Dec 2023
A Challenging Multimodal Video Summary: Simultaneously Extracting and
  Generating Keyframe-Caption Pairs from Video
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from VideoConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Keito Kudo
Haruki Nagasawa
Jun Suzuki
Nobuyuki Shimizu
192
4
0
04 Dec 2023
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
RoboVQA: Multimodal Long-Horizon Reasoning for RoboticsIEEE International Conference on Robotics and Automation (ICRA), 2023
P. Sermanet
Tianli Ding
Jeffrey Zhao
Fei Xia
Debidatta Dwibedi
...
Pannag R Sanketi
Karol Hausman
Izhak Shafran
Brian Ichter
Yuan Cao
LM&Ro
223
95
0
01 Nov 2023
Few-shot Action Recognition with Captioning Foundation Models
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
281
9
0
16 Oct 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at ScaleNeural Information Processing Systems (NeurIPS), 2023
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
186
35
0
25 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning
Collaborative Three-Stream Transformers for Video CaptioningComputer Vision and Image Understanding (CVIU), 2023
Hao Wang
Libo Zhang
Hengrui Fan
Tiejian Luo
127
8
0
18 Sep 2023
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Yongrae Jo
Seongyun Lee
Aiden Seung Joon Lee
Hyunji Lee
Hanseok Oh
Minjoon Seo
177
4
0
05 Jul 2023
Generation-Guided Multi-Level Unified Network for Video Grounding
Generation-Guided Multi-Level Unified Network for Video Grounding
Xingyi Cheng
Xiangyu Wu
Dong Shen
Hezheng Lin
Fan Yang
157
0
0
14 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TSVLM
441
318
0
27 Feb 2023
ADAPT: Action-aware Driving Caption Transformer
ADAPT: Action-aware Driving Caption TransformerIEEE International Conference on Robotics and Automation (ICRA), 2023
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
327
91
0
01 Feb 2023
Aligning Source Visual and Target Language Domains for Unpaired Video
  Captioning
Aligning Source Visual and Target Language Domains for Unpaired Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
189
27
0
22 Nov 2022
DiMBERT: Learning Vision-Language Grounded Representations with
  Disentangled Multimodal-Attention
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-AttentionACM Transactions on Knowledge Discovery from Data (TKDD), 2021
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
175
13
0
28 Oct 2022
Weakly Supervised Video Salient Object Detection via Point Supervision
Weakly Supervised Video Salient Object Detection via Point SupervisionACM Multimedia (ACM MM), 2022
Shuyong Gao
Hao Xing
Wei Zhang
Yan Wang
Qianyu Guo
Wenqiang Zhang
181
30
0
15 Jul 2022
Automatic Concept Extraction for Concept Bottleneck-based Video
  Classification
Automatic Concept Extraction for Concept Bottleneck-based Video Classification
J. Jeyakumar
Luke Dickens
L. Garcia
Yu Cheng
Diego Ramirez Echavarria
Joseph Noor
Alessandra Russo
Lance M. Kaplan
Erik P. Blasch
Mani B. Srivastava
163
18
0
21 Jun 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of
  Object Tokens
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham
Roei Herzig
K. Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinsky
Trevor Darrell
Amir Globerson
214
0
0
13 Jun 2022
The Dialog Must Go On: Improving Visual Dialog via Generative
  Self-Training
The Dialog Must Go On: Improving Visual Dialog via Generative Self-TrainingComputer Vision and Pattern Recognition (CVPR), 2022
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
226
14
0
25 May 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the routeComputer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
177
14
0
12 Apr 2022
Exploiting long-term temporal dynamics for video captioning
Exploiting long-term temporal dynamics for video captioningWorld wide web (Bussum) (WWW), 2018
Yuyu Guo
Jingqiu Zhang
Lianli Gao
118
18
0
22 Feb 2022
Synchronized Audio-Visual Frames with Fractional Positional Encoding for
  Transformers in Video-to-Text Translation
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text TranslationInternational Conference on Information Photonics (ICIP), 2021
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
145
2
0
28 Dec 2021
Dense Video Captioning Using Unsupervised Semantic Information
Dense Video Captioning Using Unsupervised Semantic Information
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
180
10
0
15 Dec 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive
  Cross-modal Matching and Denoising
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Hongyang Chao
Tao Mei
VLM
128
45
0
14 Dec 2021
Controllable Video Captioning with an Exemplar Sentence
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
139
21
0
02 Dec 2021
CLIP4Caption: CLIP for Video Caption
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIPVLM
197
171
0
13 Oct 2021
A Survey on Temporal Sentence Grounding in Videos
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
275
57
0
16 Sep 2021
Embodied AI-Driven Operation of Smart Cities: A Concise Review
Embodied AI-Driven Operation of Smart Cities: A Concise Review
Farzan Shenavarmasouleh
F. Mohammadi
M. Amini
H. Arabnia
162
8
0
22 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
166
36
0
18 Aug 2021
Cross-Modal Graph with Meta Concepts for Video Captioning
Cross-Modal Graph with Meta Concepts for Video CaptioningIEEE Transactions on Image Processing (TIP), 2021
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
267
9
0
14 Aug 2021
Full-Duplex Strategy for Video Object Segmentation
Full-Duplex Strategy for Video Object SegmentationIEEE International Conference on Computer Vision (ICCV), 2021
Ge-Peng Ji
Deng-Ping Fan
Keren Fu
Zhe Wu
Jianbing Shen
Ling Shao
VOS
362
165
0
06 Aug 2021
Enhancing Self-supervised Video Representation Learning via Multi-level
  Feature Optimization
Enhancing Self-supervised Video Representation Learning via Multi-level Feature OptimizationIEEE International Conference on Computer Vision (ICCV), 2021
Rui Qian
Yuxi Li
Huabin Liu
John See
Shuangrui Ding
Xian Liu
Dian Li
Weiyao Lin
233
43
0
04 Aug 2021
Controlled Caption Generation for Images Through Adversarial Attacks
Controlled Caption Generation for Images Through Adversarial Attacks
Nayyer Aafaq
Naveed Akhtar
Wei Liu
M. Shah
Lin Wang
AAML
111
12
0
07 Jul 2021
Saying the Unseen: Video Descriptions via Dialog Agents
Saying the Unseen: Video Descriptions via Dialog AgentsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
162
8
0
26 Jun 2021
Confidence-guided Adaptive Gate and Dual Differential Enhancement for
  Video Salient Object Detection
Confidence-guided Adaptive Gate and Dual Differential Enhancement for Video Salient Object DetectionIEEE International Conference on Multimedia and Expo (ICME), 2021
Peijia Chen
Jianhuang Lai
Guangcong Wang
Huajun Zhou
87
21
0
14 May 2021
A Survey on Natural Language Video Localization
A Survey on Natural Language Video Localization
Xinfang Liu
Xiushan Nie
Zhifang Tan
Jie Guo
Yilong Yin
205
9
0
01 Apr 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text ProblemArtificial Intelligence Review (AIR), 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
221
18
0
27 Mar 2021
Co-Grounding Networks with Semantic Attention for Referring Expression
  Comprehension in Videos
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in VideosComputer Vision and Pattern Recognition (CVPR), 2021
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
98
18
0
23 Mar 2021
Scheduled Sampling in Vision-Language Pretraining with Decoupled
  Encoder-Decoder Network
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder NetworkAAAI Conference on Artificial Intelligence (AAAI), 2021
Yehao Li
Yingwei Pan
Ting Yao
Jingwen Chen
Tao Mei
VLM
144
58
0
27 Jan 2021
End-to-End Video Question-Answer Generation with Generator-Pretester
  Network
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
119
36
0
05 Jan 2021
A Comprehensive Review on Recent Methods and Challenges of Video
  Description
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DVVLM
169
5
0
30 Nov 2020
Multimodal Topic Learning for Video Recommendation
Multimodal Topic Learning for Video Recommendation
Shi Pu
Yijiang He
Zheng Li
Mao Zheng
88
8
0
26 Oct 2020
Video captioning with stacked attention and semantic hard pull
Video captioning with stacked attention and semantic hard pullPeerJ Computer Science (PeerJ Comput. Sci.), 2020
Md. Mushfiqur Rahman
Thasinul Abedin
Khondokar S. S. Prottoy
Ayana Moshruba
Fazlul Hasan Siddiqui
165
2
0
15 Sep 2020
Relative Attribute Classification with Deep Rank SVM
Relative Attribute Classification with Deep Rank SVM
Sara Atito Ali Ahmed
Berrin Yanikoglu
83
5
0
09 Sep 2020
Video Captioning Using Weak Annotation
Video Captioning Using Weak Annotation
Jingyi Hou
Yunde Jia
Xinxiao Wu
Yayun Qi
115
2
0
02 Sep 2020
Identity-Aware Multi-Sentence Video Description
Identity-Aware Multi-Sentence Video Description
J. S. Park
Trevor Darrell
Anna Rohrbach
145
22
0
22 Aug 2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
SBAT: Video Captioning with Sparse Boundary-Aware TransformerInternational Joint Conference on Artificial Intelligence (IJCAI), 2020
Tao Jin
Siyu Huang
Ming Chen
Yingming Li
Zhongfei Zhang
201
58
0
23 Jul 2020
Knowledge Graph Extraction from Videos
Knowledge Graph Extraction from Videos
Louis Mahon
Eleonora Giunchiglia
Bowen Li
Thomas Lukasiewicz
97
21
0
20 Jul 2020
123
Next