Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1611.07675
Cited By
Video Captioning with Transferred Semantic Attributes
23 November 2016
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Video Captioning with Transferred Semantic Attributes"
50 / 115 papers shown
Title
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
Jinjing Gu
Tianbao Qin
Yuanyuan Pu
Zhengpeng Zhao
VLM
76
0
0
10 Aug 2025
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Computer Vision and Pattern Recognition (CVPR), 2025
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
230
1
0
31 Mar 2025
EgoLife: Towards Egocentric Life Assistant
Computer Vision and Pattern Recognition (CVPR), 2025
Jingkang Yang
Shuai Liu
Hongming Guo
Yuhao Dong
Xinyu Zhang
...
Joerg Widmer
Francesco Gringoli
Lei Yang
Bo Li
Ziwei Liu
EgoV
214
28
0
05 Mar 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
343
4
0
31 Dec 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
606
78
0
20 Feb 2024
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Yifan Lu
Ziqi Zhang
Chunfen Yuan
Peng Li
Yan Wang
Bing Li
Weiming Hu
113
6
0
25 Dec 2023
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Keito Kudo
Haruki Nagasawa
Jun Suzuki
Nobuyuki Shimizu
192
4
0
04 Dec 2023
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
IEEE International Conference on Robotics and Automation (ICRA), 2023
P. Sermanet
Tianli Ding
Jeffrey Zhao
Fei Xia
Debidatta Dwibedi
...
Pannag R Sanketi
Karol Hausman
Izhak Shafran
Brian Ichter
Yuan Cao
LM&Ro
223
95
0
01 Nov 2023
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
281
9
0
16 Oct 2023
VidChapters-7M: Video Chapters at Scale
Neural Information Processing Systems (NeurIPS), 2023
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
186
35
0
25 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning
Computer Vision and Image Understanding (CVIU), 2023
Hao Wang
Libo Zhang
Hengrui Fan
Tiejian Luo
127
8
0
18 Sep 2023
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Yongrae Jo
Seongyun Lee
Aiden Seung Joon Lee
Hyunji Lee
Hanseok Oh
Minjoon Seo
177
4
0
05 Jul 2023
Generation-Guided Multi-Level Unified Network for Video Grounding
Xingyi Cheng
Xiangyu Wu
Dong Shen
Hezheng Lin
Fan Yang
157
0
0
14 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
441
318
0
27 Feb 2023
ADAPT: Action-aware Driving Caption Transformer
IEEE International Conference on Robotics and Automation (ICRA), 2023
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
327
91
0
01 Feb 2023
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
189
27
0
22 Nov 2022
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
ACM Transactions on Knowledge Discovery from Data (TKDD), 2021
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
175
13
0
28 Oct 2022
Weakly Supervised Video Salient Object Detection via Point Supervision
ACM Multimedia (ACM MM), 2022
Shuyong Gao
Hao Xing
Wei Zhang
Yan Wang
Qianyu Guo
Wenqiang Zhang
181
30
0
15 Jul 2022
Automatic Concept Extraction for Concept Bottleneck-based Video Classification
J. Jeyakumar
Luke Dickens
L. Garcia
Yu Cheng
Diego Ramirez Echavarria
Joseph Noor
Alessandra Russo
Lance M. Kaplan
Erik P. Blasch
Mani B. Srivastava
163
18
0
21 Jun 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham
Roei Herzig
K. Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinsky
Trevor Darrell
Amir Globerson
214
0
0
13 Jun 2022
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Computer Vision and Pattern Recognition (CVPR), 2022
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
226
14
0
25 May 2022
Video Captioning: a comparative review of where we are and which could be the route
Computer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
177
14
0
12 Apr 2022
Exploiting long-term temporal dynamics for video captioning
World wide web (Bussum) (WWW), 2018
Yuyu Guo
Jingqiu Zhang
Lianli Gao
118
18
0
22 Feb 2022
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
International Conference on Information Photonics (ICIP), 2021
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
145
2
0
28 Dec 2021
Dense Video Captioning Using Unsupervised Semantic Information
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
180
10
0
15 Dec 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Hongyang Chao
Tao Mei
VLM
128
45
0
14 Dec 2021
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
139
21
0
02 Dec 2021
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
197
171
0
13 Oct 2021
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
275
57
0
16 Sep 2021
Embodied AI-Driven Operation of Smart Cities: A Concise Review
Farzan Shenavarmasouleh
F. Mohammadi
M. Amini
H. Arabnia
162
8
0
22 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
166
36
0
18 Aug 2021
Cross-Modal Graph with Meta Concepts for Video Captioning
IEEE Transactions on Image Processing (TIP), 2021
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
267
9
0
14 Aug 2021
Full-Duplex Strategy for Video Object Segmentation
IEEE International Conference on Computer Vision (ICCV), 2021
Ge-Peng Ji
Deng-Ping Fan
Keren Fu
Zhe Wu
Jianbing Shen
Ling Shao
VOS
362
165
0
06 Aug 2021
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
IEEE International Conference on Computer Vision (ICCV), 2021
Rui Qian
Yuxi Li
Huabin Liu
John See
Shuangrui Ding
Xian Liu
Dian Li
Weiyao Lin
233
43
0
04 Aug 2021
Controlled Caption Generation for Images Through Adversarial Attacks
Nayyer Aafaq
Naveed Akhtar
Wei Liu
M. Shah
Lin Wang
AAML
111
12
0
07 Jul 2021
Saying the Unseen: Video Descriptions via Dialog Agents
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
162
8
0
26 Jun 2021
Confidence-guided Adaptive Gate and Dual Differential Enhancement for Video Salient Object Detection
IEEE International Conference on Multimedia and Expo (ICME), 2021
Peijia Chen
Jianhuang Lai
Guangcong Wang
Huajun Zhou
87
21
0
14 May 2021
A Survey on Natural Language Video Localization
Xinfang Liu
Xiushan Nie
Zhifang Tan
Jie Guo
Yilong Yin
205
9
0
01 Apr 2021
A Comprehensive Review of the Video-to-Text Problem
Artificial Intelligence Review (AIR), 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
221
18
0
27 Mar 2021
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Computer Vision and Pattern Recognition (CVPR), 2021
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
98
18
0
23 Mar 2021
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
AAAI Conference on Artificial Intelligence (AAAI), 2021
Yehao Li
Yingwei Pan
Ting Yao
Jingwen Chen
Tao Mei
VLM
144
58
0
27 Jan 2021
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
119
36
0
05 Jan 2021
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
169
5
0
30 Nov 2020
Multimodal Topic Learning for Video Recommendation
Shi Pu
Yijiang He
Zheng Li
Mao Zheng
88
8
0
26 Oct 2020
Video captioning with stacked attention and semantic hard pull
PeerJ Computer Science (PeerJ Comput. Sci.), 2020
Md. Mushfiqur Rahman
Thasinul Abedin
Khondokar S. S. Prottoy
Ayana Moshruba
Fazlul Hasan Siddiqui
165
2
0
15 Sep 2020
Relative Attribute Classification with Deep Rank SVM
Sara Atito Ali Ahmed
Berrin Yanikoglu
83
5
0
09 Sep 2020
Video Captioning Using Weak Annotation
Jingyi Hou
Yunde Jia
Xinxiao Wu
Yayun Qi
115
2
0
02 Sep 2020
Identity-Aware Multi-Sentence Video Description
J. S. Park
Trevor Darrell
Anna Rohrbach
145
22
0
22 Aug 2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Tao Jin
Siyu Huang
Ming Chen
Yingming Li
Zhongfei Zhang
201
58
0
23 Jul 2020
Knowledge Graph Extraction from Videos
Louis Mahon
Eleonora Giunchiglia
Bowen Li
Thomas Lukasiewicz
97
21
0
20 Jul 2020
1
2
3
Next