ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.06615
  4. Cited By
CLIP4Caption: CLIP for Video Caption

CLIP4Caption: CLIP for Video Caption

13 October 2021
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
    CLIPVLM
ArXiv (abs)PDFHTML

Papers citing "CLIP4Caption: CLIP for Video Caption"

25 / 25 papers shown
Title
Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach
Youqi Wu
Jingwei Zhang
Farzan Farnia
13
0
0
10 Jun 2025
Video-Level Language-Driven Video-Based Visible-Infrared Person Re-Identification
Video-Level Language-Driven Video-Based Visible-Infrared Person Re-Identification
Shuang Li
Jiaxu Leng
Changjiang Kuang
Mingpi Tan
Xinbo Gao
54
0
0
03 Jun 2025
SPKLIP: Aligning Spike Video Streams with Natural Language
SPKLIP: Aligning Spike Video Streams with Natural Language
Yongchang Gao
Meiling Jin
Zhaofei Yu
Tiejun Huang
Guozhang Chen
CLIPVLM
222
0
0
19 May 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
98
1
0
21 Mar 2025
MMRL: Multi-Modal Representation Learning for Vision-Language Models
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
VLMOffRL
446
3
0
11 Mar 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
130
2
0
31 Dec 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning
  Through Retrieval and Understanding Modalities
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
67
1
0
04 Nov 2024
CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person
  Re-Identification
CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification
Shuang Li
Jiaxu Leng
Guozhang Li
Ji Gan
Haosheng chen
Xinbo Gao
101
2
0
13 Jun 2024
An Initial Exploration: Learning to Generate Realistic Audio for Silent
  Video
An Initial Exploration: Learning to Generate Realistic Audio for Silent Video
Matthew Martel
Jack Wagner
VGen
40
0
0
23 Aug 2023
ViCo: Engaging Video Comment Generation with Human Preference Rewards
ViCo: Engaging Video Comment Generation with Human Preference Rewards
Yuchong Sun
Bei Liu
Xu Chen
Ruihua Song
Jianlong Fu
VGen
52
2
0
22 Aug 2023
Open-Vocabulary Object Detection via Scene Graph Discovery
Open-Vocabulary Object Detection via Scene Graph Discovery
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
82
12
0
07 Jul 2023
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in
  Indonesian
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian
Willy Fitra Hendria
53
3
0
20 Jun 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Wen-tau Yih
VGen
53
3
0
04 May 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
77
35
0
29 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
84
14
0
04 Mar 2023
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
Qinghao Ye
Guohai Xu
Ming Yan
Haiyang Xu
Qi Qian
Ji Zhang
Fei Huang
VLMAI4TS
220
75
0
30 Dec 2022
CLIP-Driven Fine-grained Text-Image Person Re-identification
CLIP-Driven Fine-grained Text-Image Person Re-identification
Shuanglin Yan
Neng Dong
Liyan Zhang
Jinhui Tang
93
94
0
19 Oct 2022
REST: REtrieve & Self-Train for generative action recognition
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
54
4
0
29 Sep 2022
Visual Subtitle Feature Enhanced Video Outline Generation
Visual Subtitle Feature Enhanced Video Outline Generation
Qi Lv
Ziqiang Cao
Wenrui Xie
Derui Wang
Jingwen Wang
...
Yuan-Fang Li
Min Cao
Wenjie Li
Sujian Li
Guohong Fu
VGen
92
0
0
24 Aug 2022
Zero-Shot Video Captioning with Evolving Pseudo-Tokens
Zero-Shot Video Captioning with Evolving Pseudo-Tokens
Yoad Tewel
Yoav Shalev
Roy Nadler
Idan Schwartz
Lior Wolf
58
27
0
22 Jul 2022
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding
  without Text Inputs
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs
Tal Shaharabany
Yoad Tewel
Lior Wolf
ObjD
91
16
0
19 Jun 2022
CLIP4IDC: CLIP for Image Difference Captioning
CLIP4IDC: CLIP for Image Difference Captioning
Zixin Guo
Tong Wang
Jorma T. Laaksonen
VLM
65
29
0
01 Jun 2022
Unsupervised Prompt Learning for Vision-Language Models
Unsupervised Prompt Learning for Vision-Language Models
Hao Huang
Jack Chu
Fangyun Wei
VPVLMMLLMVLM
107
133
0
07 Apr 2022
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark
  of Data, Model, and Supervision
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision
Yufeng Cui
Lichen Zhao
Feng Liang
Yangguang Li
Jing Shao
UQCVVLMCLIP
112
43
0
11 Mar 2022
CRIS: CLIP-Driven Referring Image Segmentation
CRIS: CLIP-Driven Referring Image Segmentation
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
121
372
0
30 Nov 2021
1