Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2002.11566
Cited By
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2020
26 February 2020
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Object Relational Graph with Teacher-Recommended Learning for Video Captioning"
50 / 116 papers shown
Title
Language-guided Recursive Spatiotemporal Graph Modeling for Video Summarization
International Journal of Computer Vision (IJCV), 2025
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
28
0
0
06 Sep 2025
GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
Zijun Lin
Shuting He
Cheston Tan
Bihan Wen
AI4TS
145
0
0
26 Jun 2025
Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering
IEEE transactions on multimedia (TMM), 2025
Peipei Song
Li Zhang
Long Lan
Weidong Chen
D. Guo
Xun Yang
Meng Wang
117
8
0
15 Apr 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Caihua Liu
Xu Li
Wenjing Xue
Wei Tang
Xia Feng
119
0
0
20 Feb 2025
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
356
4
0
14 Nov 2024
Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning
Pattern Recognition (Pattern Recogn.), 2024
Ping Li
Tao Wang
Xinkui Zhao
Xianghua Xu
Mingli Song
131
8
0
06 Nov 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
151
1
0
04 Nov 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
104
0
0
22 Oct 2024
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
Neural Information Processing Systems (NeurIPS), 2024
Tieyuan Chen
Huabin Liu
Tianyao He
Yihang Chen
Chaofan Gan
...
Cheng Zhong
Yang Zhang
Yingxue Wang
Hui Lin
Weiyao Lin
VGen
CML
243
17
0
26 Sep 2024
HOTVCOM: Generating Buzzworthy Comments for Videos
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuyan Chen
Yiwen Qian
Songzhou Yan
Jiyuan Jia
Zhixu Li
Yanghua Xiao
Xiaobo Li
Ming-Hsuan Yang
Qingpei Guo
137
8
0
23 Sep 2024
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
Yuchen Yang
Yingxuan Duan
VGen
103
0
0
19 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
International Conference on Learning Representations (ICLR), 2024
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
295
5
0
10 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
320
21
1
09 Jun 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
117
5
0
19 Apr 2024
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Computer Vision and Pattern Recognition (CVPR), 2024
Hao Wu
Huabin Liu
Yu Qiao
Xiao Sun
3DV
62
16
0
03 Apr 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
191
24
0
26 Mar 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
Computer Vision and Pattern Recognition (CVPR), 2024
Xinyu Wang
Bohan Zhuang
Qi Wu
106
16
0
12 Jan 2024
Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition
Xuzheng Yu
Chen Jiang
Wei Zhang
Tian Gan
Linlin Chao
Jianan Zhao
Yuan Cheng
Qingpei Guo
Wei Chu
164
0
0
09 Jan 2024
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Yifan Lu
Ziqi Zhang
Chunfen Yuan
Peng Li
Yan Wang
Bing Li
Weiming Hu
100
5
0
25 Dec 2023
Subject-Oriented Video Captioning
Yunchuan Ma
Chang Teng
Yuankai Qi
Guorong Li
Laiyun Qing
Qi Wu
Qingming Huang
110
0
0
20 Dec 2023
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
199
1
0
10 Dec 2023
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRM
MLLM
AuLLM
176
72
0
30 Nov 2023
VidChapters-7M: Video Chapters at Scale
Neural Information Processing Systems (NeurIPS), 2023
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
142
32
0
25 Sep 2023
Accurate and Fast Compressed Video Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Yaojie Shen
Xin Gu
Kai Xu
Hengrui Fan
Longyin Wen
Libo Zhang
ViT
116
40
0
22 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning
Computer Vision and Image Understanding (CVIU), 2023
Hao Wang
Libo Zhang
Hengrui Fan
Tiejian Luo
99
7
0
18 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
International Conference on Machine Learning (ICML), 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
234
659
0
11 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
214
29
0
27 Aug 2023
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Yinan Han
Jing Wang
105
1
0
13 Aug 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Neural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
212
41
0
13 Jul 2023
Any-to-Any Generation via Composable Diffusion
Neural Information Processing Systems (NeurIPS), 2023
Zineng Tang
Ziyi Yang
Chenguang Zhu
Michael Zeng
Joey Tianyi Zhou
VGen
DiffM
183
229
0
19 May 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Anuj Kumar
VGen
107
4
0
04 May 2023
A Review of Deep Learning for Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
161
33
0
22 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
245
135
0
17 Apr 2023
Graph Attention for Automated Audio Captioning
IEEE Signal Processing Letters (IEEE SPL), 2023
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
121
10
0
07 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
166
8
0
04 Apr 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
257
30
0
29 Mar 2023
Fine-grained Audible Video Description
Computer Vision and Pattern Recognition (CVPR), 2023
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
112
17
0
27 Mar 2023
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation
International Conference on Information and Knowledge Management (CIKM), 2023
Ji Qi
Jifan Yu
Teng Tu
Kunyu Gao
Yifan Xu
...
Juanzi Li
Jie Tang
Weidong Guo
Hui Liu
Yu-Syuan Xu
124
26
0
26 Mar 2023
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Computer Vision and Pattern Recognition (CVPR), 2023
Dohwan Ko
Joon-Young Choi
Hyeong Kyu Choi
Kyoung-Woon On
Byungseok Roh
Hyunwoo J. Kim
186
27
0
23 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
159
66
0
22 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
118
16
0
12 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
279
301
0
27 Feb 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training
AAAI Conference on Artificial Intelligence (AAAI), 2023
Weihong Zhong
Mao Zheng
Duyu Tang
Xuan Luo
Heng Gong
Xiaocheng Feng
Bing Qin
202
9
0
20 Feb 2023
ADAPT: Action-aware Driving Caption Transformer
IEEE International Conference on Robotics and Automation (ICRA), 2023
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
259
89
0
01 Feb 2023
Semi-Parametric Video-Grounded Text Generation
Sungdong Kim
Jin-Hwa Kim
Jiyoung Lee
Minjoon Seo
VGen
164
17
0
27 Jan 2023
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
195
67
0
09 Dec 2022
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
AAAI Conference on Artificial Intelligence (AAAI), 2022
Zhuo Zhou
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffM
VGen
176
54
0
28 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
157
25
0
22 Nov 2022
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Neural Information Processing Systems (NeurIPS), 2022
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David Clifton
Jing Chen
VLM
188
82
0
21 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
101
19
0
17 Nov 2022
1
2
3
Next