Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.11438
Cited By
Reconstruction Network for Video Captioning
30 March 2018
Bairui Wang
Lin Ma
Wei Zhang
W. Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reconstruction Network for Video Captioning"
50 / 135 papers shown
Title
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Jian Hu
Dimitrios Korkinof
S. Gong
Mariano Beguerisse-Díaz
VLM
38
0
0
22 Apr 2025
F
3
^3
3
Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
Zhaoyu Liu
Kan Jiang
Murong Ma
Zhé Hóu
Yun Lin
J. Dong
35
0
0
11 Apr 2025
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu
Kun Yuan
Yaling Shen
Feilong Tang
Xiaohao Xu
...
Jin Ye
N. Padoy
Nassir Navab
Junjun He
Zongyuan Ge
VLM
CLIP
87
11
0
23 Nov 2024
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
31
0
0
11 Nov 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
37
0
0
04 Nov 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
26
0
0
22 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
Dual-path Collaborative Generation Network for Emotional Video Captioning
Cheng Ye
Weidong Chen
Jingyu Li
L. Zhang
Zhendong Mao
84
1
0
06 Aug 2024
PC
2
^2
2
: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval
Yue Duan
Zhangxuan Gu
ZhenZhe Ying
Wei Li
Yu Zhang
Zibin Zheng
26
2
0
02 Aug 2024
VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It
Xiaoxuan Zhu
Zhouhong Gu
Sihang Jiang
Zhixu Li
Hongwei Feng
Yanghua Xiao
21
0
0
15 Jun 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
26
4
0
19 Apr 2024
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim
Hyeon Bae Kim
Jinyoung Moon
Jinwoo Choi
Seong Tae Kim
34
17
0
11 Apr 2024
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
Wenhao Wang
Yi Yang
VGen
DiffM
31
31
0
10 Mar 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
27
44
0
20 Feb 2024
SnapCap: Efficient Snapshot Compressive Video Captioning
Jianqiao Sun
Yudi Su
Hao Zhang
Ziheng Cheng
Zequn Zeng
Zhengjue Wang
Bo Chen
Xin Yuan
24
1
0
10 Jan 2024
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
Keito Kudo
Haruki Nagasawa
Jun Suzuki
Nobuyuki Shimizu
32
2
0
04 Dec 2023
Multi Sentence Description of Complex Manipulation Action Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
F. Worgotter
23
1
0
13 Nov 2023
Student Classroom Behavior Detection based on Spatio-Temporal Network and Multi-Model Fusion
Fan Yang
Xiaofei Wang
22
1
0
25 Oct 2023
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
26
7
0
16 Oct 2023
SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior
Fan Yang
Tao Wang
18
17
0
04 Oct 2023
A Hierarchical Graph-based Approach for Recognition and Description Generation of Bimanual Actions in Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
F. Worgotter
17
1
0
01 Oct 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
13
26
0
25 Sep 2023
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan
Xuange Zhang
Kun Liu
Bo Liu
Chen Chen
Jian Jin
Zhenzhen Jiao
AI4TS
19
13
0
25 Sep 2023
Explaining Vision and Language through Graphs of Events in Space and Time
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
VLM
54
2
0
29 Aug 2023
MusicJam: Visualizing Music Insights via Generated Narrative Illustrations
Chuer Chen
Nan Cao
Jiani Hou
Yi Guo
Yulei Zhang
Yang Shi
DiffM
24
0
0
22 Aug 2023
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Bin Liu
Jing Wang
30
1
0
13 Aug 2023
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Yongrae Jo
Seongyun Lee
Aiden Seung Joon Lee
Hyunji Lee
Hanseok Oh
Minjoon Seo
16
2
0
05 Jul 2023
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation
Qianji Di
Wenxing Ma
Zhongang Qi
Tianxiang Hou
Ying Shan
Hanzi Wang
14
0
0
23 Jun 2023
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
18
2
0
21 Jun 2023
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
14
0
0
22 May 2023
A Topic-aware Summarization Framework with Different Modal Side Information
Xiuying Chen
Mingzhe Li
Shen Gao
Xin Cheng
Qiang Yang
Qishen Zhang
Xin Gao
Xiangliang Zhang
20
13
0
19 May 2023
Boosting Weakly-Supervised Temporal Action Localization with Text Information
Guozhang Li
De-Chun Cheng
Xinpeng Ding
N. Wang
Xiaoyu Wang
Xinbo Gao
28
23
0
01 May 2023
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Erik Cambria
Fatih Porikli
3DV
27
20
0
22 Apr 2023
Fine-grained Audible Video Description
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
36
11
0
27 Mar 2023
Plug-and-Play Regulators for Image-Text Matching
Haiwen Diao
Y. Zhang
W. Liu
Xiang Ruan
Huchuan Lu
27
20
0
23 Mar 2023
Guided Slot Attention for Unsupervised Video Object Segmentation
Minhyeok Lee
Suhwan Cho
Dogyoon Lee
Chaewon Park
Jungho Lee
Sangyoun Lee
VOS
58
10
0
15 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
23
220
0
27 Feb 2023
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Mohit Bansal
Gedas Bertasius
VLM
27
78
0
09 Dec 2022
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
X. Zhong
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffM
VGen
19
40
0
28 Nov 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
26
15
0
21 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
23
16
0
17 Nov 2022
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
25
5
0
26 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
33
16
0
05 Oct 2022
Thinking Hallucination for Video Captioning
Nasib Ullah
Partha Pratim Mohanta
VLM
34
4
0
28 Sep 2022
Multi-modal Video Chapter Generation
Xiao Cao
Zitan Chen
Canyu Le
Lei Meng
VGen
29
3
0
26 Sep 2022
Unsupervised Video Object Segmentation via Prototype Memory Network
Minhyeok Lee
Suhwan Cho
Seung-Hyun Lee
Chaewon Park
Sangyoun Lee
VOS
34
36
0
08 Sep 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
Haoran Wang
Dongliang He
Wenhao Wu
Boyang Xia
Min Yang
Fu Li
Yunlong Yu
Zhong Ji
Errui Ding
Jingdong Wang
22
22
0
21 Aug 2022
Boosting Video-Text Retrieval with Explicit High-Level Semantics
Haoran Wang
Di Xu
Dongliang He
Fu Li
Zhong Ji
Jungong Han
Errui Ding
24
11
0
08 Aug 2022
MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild
Y. Liu
Wei Dai
Chuanxu Feng
Wenbin Wang
Guanghao Yin
Jiabei Zeng
Shiguang Shan
CVBM
20
62
0
01 Aug 2022
Automatic Concept Extraction for Concept Bottleneck-based Video Classification
J. Jeyakumar
Luke Dickens
L. Garcia
Yu Cheng
Diego Ramirez Echavarria
Joseph Noor
Alessandra Russo
Lance M. Kaplan
Erik P. Blasch
Mani B. Srivastava
13
8
0
21 Jun 2022
1
2
3
Next