Object Relational Graph with Teacher-Recommended Learning for Video
  Captioning

Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Computer Vision and Pattern Recognition (CVPR), 2020
    VLM

Papers citing "Object Relational Graph with Teacher-Recommended Learning for Video Captioning"

50 / 116 papers shown
Title
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
119
0
0
20 Feb 2025
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
MECD: Unlocking Multi-Event Causal Discovery in Video ReasoningNeural Information Processing Systems (NeurIPS), 2024
Tieyuan Chen
Huabin Liu
Tianyao He
Yihang Chen
Chaofan Gan
...
Cheng Zhong
Yang Zhang
Yingxue Wang
Hui Lin
Weiyao Lin
243
17
0
26 Sep 2024
HOTVCOM: Generating Buzzworthy Comments for Videos
HOTVCOM: Generating Buzzworthy Comments for VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
137
8
0
23 Sep 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
ModaVerse: Efficiently Transforming Modalities with LLMsComputer Vision and Pattern Recognition (CVPR), 2024
Xinyu Wang
Bohan Zhuang
Qi Wu
106
16
0
12 Jan 2024
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any GenerationComputer Vision and Pattern Recognition (CVPR), 2023
176
72
0
30 Nov 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at ScaleNeural Information Processing Systems (NeurIPS), 2023
142
32
0
25 Sep 2023
Accurate and Fast Compressed Video Captioning
Accurate and Fast Compressed Video CaptioningIEEE International Conference on Computer Vision (ICCV), 2023
116
40
0
22 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning
Collaborative Three-Stream Transformers for Video CaptioningComputer Vision and Image Understanding (CVIU), 2023
99
7
0
18 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLMInternational Conference on Machine Learning (ICML), 2023
234
658
0
11 Sep 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
245
135
0
17 Apr 2023
Graph Attention for Automated Audio Captioning
Graph Attention for Automated Audio CaptioningIEEE Signal Processing Letters (IEEE SPL), 2023
121
10
0
07 Apr 2023
Fine-grained Audible Video Description
Fine-grained Audible Video DescriptionComputer Vision and Pattern Recognition (CVPR), 2023
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
112
17
0
27 Mar 2023
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for
  Real-time Soccer Commentary Generation
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary GenerationInternational Conference on Information and Knowledge Management (CIKM), 2023
124
26
0
26 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Text with Knowledge Graph Augmented Transformer for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2023
159
66
0
22 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal ProcessingAAAI Conference on Artificial Intelligence (AAAI), 2023
118
16
0
12 Mar 2023
ADAPT: Action-aware Driving Caption Transformer
ADAPT: Action-aware Driving Caption TransformerIEEE International Conference on Robotics and Automation (ICRA), 2023
259
89
0
01 Feb 2023
Aligning Source Visual and Target Language Domains for Unpaired Video
  Captioning
Aligning Source Visual and Target Language Domains for Unpaired Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
157
25
0
22 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video CaptioningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
101
19
0
17 Nov 2022

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.