ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.11097
  4. Cited By
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

21 June 2021
Han Fang
Pengfei Xiong
Luhui Xu
Yu Chen
    CLIP
    VLM
ArXivPDFHTML

Papers citing "CLIP2Video: Mastering Video-Text Retrieval via Image CLIP"

50 / 189 papers shown
Title
Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption
Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption
Kazuki Adachi
Shinýa Yamaguchi
Tomoki Hamagami
VLM
0
0
0
19 May 2025
Symbolic Representation for Any-to-Any Generative Tasks
Symbolic Representation for Any-to-Any Generative Tasks
Jianfei Chen
Xiaoye Zhu
Yalin Wang
Tianyang Liu
Xinhui Chen
...
Yifei Ke
Jiaheng Liu
Yiwen Yuan
Julian McAuley
Li Li
DiffM
38
0
0
24 Apr 2025
A Lightweight Moment Retrieval System with Global Re-Ranking and Robust Adaptive Bidirectional Temporal Search
A Lightweight Moment Retrieval System with Global Re-Ranking and Robust Adaptive Bidirectional Temporal Search
Tinh-Anh Nguyen-Nhu
H. Tran
Nguyen-Khang Le
Minh-Nhat Nguyen
T. Nguyen
...
Huu-Phong Phan-Nguyen
Huy-Thach Pham
Quan Nguyen
Hoang M. Le
Quang-Vinh Dinh
49
0
0
12 Apr 2025
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Mingfei Chen
I. D. Gebru
Ishwarya Ananthabhotla
Christian Richardt
Dejan Marković
Jake Sandakly
Steven Krenn
Todd Keebler
Eli Shlizerman
Alexander Richard
24
0
0
08 Apr 2025
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
Sofian Chaybouti
Walid Bousselham
Moritz Wolter
Hilde Kuehne
116
0
0
07 Apr 2025
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
38
0
0
03 Apr 2025
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
33
0
0
02 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
70
1
0
01 Apr 2025
Vision-to-Music Generation: A Survey
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
74
1
0
27 Mar 2025
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
Ziyu Guo
Young Yoon Lee
Joseph Liu
Yizhak Ben-Shabat
Victor Zordan
Mubbasir Kapadia
DiffM
VGen
73
0
0
27 Mar 2025
MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
Saron Samuel
Dan DeGenaro
Jimena Guallar-Blasco
Kate Sanders
Oluwaseun Eisape
...
David Etter
Efsun Kayi
Matthew Wiesner
Kenton W. Murray
Reno Kriz
85
0
0
26 Mar 2025
Can Text-to-Video Generation help Video-Language Alignment?
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella
Massimiliano Mancini
Willi Menapace
Sergey Tulyakov
Yiming Wang
Elisa Ricci
DiffM
VGen
57
0
0
24 Mar 2025
CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention
CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention
Yaxiong Chen
Minghong Wei
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
MedIm
46
0
0
20 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
63
0
0
18 Mar 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen
Zhengrong Yue
Siran Chen
Zehua Wang
Yang Liu
Peng Li
Yue Wang
VLM
162
0
0
13 Mar 2025
Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing
Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing
Zecheng Zhao
Zhi Chen
Zi-Rui Huang
S. Sadiq
Tong Chen
36
0
0
13 Mar 2025
AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
Wenxin Ma
Xu Zhang
Qingsong Yao
Fenghe Tang
Chenxu Wu
Heng Chang
Rui Yan
Zihang Jiang
S. Kevin Zhou
VLM
65
0
0
09 Mar 2025
Detecting Content Rating Violations in Android Applications: A Vision-Language Approach
Detecting Content Rating Violations in Android Applications: A Vision-Language Approach
Dishanika Denipitiyage
B. Silva
Suranga Seneviratne
A. Seneviratne
Sanjay Chawla
48
0
0
07 Feb 2025
Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
Po-han Li
Yunhao Yang
Mohammad Omama
Sandeep P. Chinchali
Ufuk Topcu
41
1
0
15 Nov 2024
Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation
  Models
Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
Nils Blank
Moritz Reuss
Marcel Rühle
Ömer Erdinç Yagmurlu
Fabian Wenzel
Oier Mees
Rudolf Lioutikov
LM&Ro
OffRL
29
4
0
23 Oct 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Beyond Coarse-Grained Matching in Video-Text Retrieval
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
36
0
0
16 Oct 2024
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz
Kate Sanders
David Etter
Kenton W. Murray
Cameron Carpenter
...
Alexander Martin
Ronald Colaianni
Nolan King
Eugene Yang
Benjamin Van Durme
VGen
45
2
0
15 Oct 2024
Bridging Text and Image for Artist Style Transfer via Contrastive
  Learning
Bridging Text and Image for Artist Style Transfer via Contrastive Learning
Zhi-Song Liu
Li-Wen Wang
Jun Xiao
Vicky Kalogeiton
CLIP
VLM
33
0
0
12 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
M. Lyu
Liwei Wang
VLM
30
0
0
08 Oct 2024
Contrastive Abstraction for Reinforcement Learning
Contrastive Abstraction for Reinforcement Learning
Vihang Patil
M. Hofmarcher
Elisabeth Rumetshofer
Sepp Hochreiter
OffRL
SSL
24
2
0
01 Oct 2024
Efficient Backdoor Defense in Multimodal Contrastive Learning: A
  Token-Level Unlearning Method for Mitigating Threats
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats
Kuanrong Liu
Siyuan Liang
Jiawei Liang
Pengwen Dai
Xiaochun Cao
MU
AAML
36
1
0
29 Sep 2024
Geospatial foundation models for image analysis: evaluating and
  enhancing NASA-IBM Prithvi's domain adaptability
Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
Chia-Yu Hsu
Wenwen Li
Sizhe Wang
42
12
0
31 Aug 2024
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Wei-Jhe Huang
Min-Hung Chen
Shang-Hong Lai
37
0
0
28 Aug 2024
T2VIndexer: A Generative Video Indexer for Efficient Text-Video
  Retrieval
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Yili Li
Jing Yu
Keke Gai
Bang Liu
Gang Xiong
Qi Wu
DiffM
VGen
31
2
0
21 Aug 2024
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming
  Product Retrieval
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval
Xiaowan Hu
Yiyi Chen
Yan Li
Minquan Wang
Haoqian Wang
Quan Chen
Han Li
Peng Jiang
AI4TS
31
0
0
23 Jul 2024
Open Vocabulary Multi-Label Video Classification
Open Vocabulary Multi-Label Video Classification
Rohit Gupta
Mamshad Nayeem Rizve
Jayakrishnan Unnikrishnan
Ashish Tawari
Son Tran
Mubarak Shah
Benjamin Z. Yao
Trishul Chilimbi
VLM
67
1
0
12 Jul 2024
Do Generalised Classifiers really work on Human Drawn Sketches?
Do Generalised Classifiers really work on Human Drawn Sketches?
Hmrishav Bandyopadhyay
Pinaki Nath Chowdhury
Aneeshan Sain
Subhadeep Koley
Tao Xiang
A. Bhunia
Yi-Zhe Song
VLM
33
2
0
04 Jul 2024
Joint-Dataset Learning and Cross-Consistent Regularization for
  Text-to-Motion Retrieval
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
Nicola Messina
J. Sedmidubský
Fabrizio Falchi
Tomáš Rebok
51
0
0
02 Jul 2024
Multi-Scale Temporal Difference Transformer for Video-Text Retrieval
Multi-Scale Temporal Difference Transformer for Video-Text Retrieval
Ni Wang
Dongliang Liao
Xing Xu
38
0
0
23 Jun 2024
Multi-Granularity and Multi-modal Feature Interaction Approach for Text
  Video Retrieval
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval
Wenjun Li
Shudong Wang
Dong Zhao
Shenghui Xu
Zhaoming Pan
Zhimin Zhang
34
0
0
21 Jun 2024
Towards Holistic Language-video Representation: the language
  model-enhanced MSR-Video to Text Dataset
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
Yuchen Yang
Yingxuan Duan
VGen
32
0
0
19 Jun 2024
From a Social Cognitive Perspective: Context-aware Visual Social
  Relationship Recognition
From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Shiwei Wu
Chao Zhang
Joya Chen
Tong Xu
Likang Wu
Yao Hu
Enhong Chen
27
0
0
12 Jun 2024
Diving Deep into the Motion Representation of Video-Text Models
Diving Deep into the Motion Representation of Video-Text Models
Chinmaya Devaraj
Cornelia Fermuller
Yiannis Aloimonos
DiffM
VGen
41
0
0
07 Jun 2024
OmniBind: Teach to Build Unequal-Scale Modality Interaction for
  Omni-Bind of All
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All
Yuanhuiyi Lyu
Xueye Zheng
Dahun Kim
Lin Wang
51
13
0
25 May 2024
An Empirical Study of Excitation and Aggregation Design Adaptions in
  CLIP4Clip for Video-Text Retrieval
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
Xiaolun Jing
Genke Yang
Jian Chu
CLIP
42
1
0
25 May 2024
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Haonan Zhang
Pengpeng Zeng
Lianli Gao
Jingkuan Song
Yihang Duan
Xinyu Lyu
Hengtao Shen
VLM
CLIP
40
2
0
21 May 2024
Learning text-to-video retrieval from image captioning
Learning text-to-video retrieval from image captioning
Lucas Ventura
Cordelia Schmid
Gül Varol
3DV
44
3
0
26 Apr 2024
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
Xuzheng Yu
Chen Jiang
Xingning Dong
Tian Gan
Ming Yang
Qingpei Guo
45
1
0
22 Apr 2024
PracticalDG: Perturbation Distillation on Vision-Language Models for
  Hybrid Domain Generalization
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
Zining Chen
Weiqiu Wang
Zhicheng Zhao
Fei Su
Aidong Men
Hongying Meng
VLM
37
7
0
13 Apr 2024
Improving Continuous Sign Language Recognition with Adapted Image Models
Improving Continuous Sign Language Recognition with Adapted Image Models
Lianyu Hu
Tongkai Shi
Liqing Gao
Zekang Liu
Wei Feng
VLM
26
5
0
12 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
68
56
0
04 Apr 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
59
20
0
26 Mar 2024
Composed Video Retrieval via Enriched Context and Discriminative
  Embeddings
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar
Muzammal Naseer
Rao Muhammad Anwer
Salman Khan
M. Felsberg
Mubarak Shah
Fahad Shahbaz Khan
34
7
0
25 Mar 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLM
AI4TS
58
4
0
21 Mar 2024
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind
  Them All
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu
Xueye Zheng
Jiazhou Zhou
Lin Wang
32
16
0
19 Mar 2024
1234
Next