Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2102.12443
Cited By
v1
v2 (latest)
A Straightforward Framework For Video Retrieval Using CLIP
Mexican Conference on Pattern Recognition (MPR), 2021
24 February 2021
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Github (70★)
Papers citing
"A Straightforward Framework For Video Retrieval Using CLIP"
50 / 64 papers shown
MSAM: Multi-Semantic Adaptive Mining for Cross-Modal Drone Video-Text Retrieval
J. Huang
Yaxiong Chen
Ganchao Liu
154
0
0
17 Oct 2025
VC-Agent: An Interactive Agent for Customized Video Dataset Collection
Yidan Zhang
Mutian Xu
Yiming Hao
Kun Zhou
Jiahao Chang
Xiaoqiang Liu
Pengfei Wan
Hongbo Fu
Xiaoguang Han
VGen
206
1
0
25 Sep 2025
BiListing: Modality Alignment for Listings
Guillaume Guy
Mihajlo Grbovic
Chun How Tan
Han Zhao
217
0
0
28 Aug 2025
Adversarial Video Promotion Against Text-to-Video Retrieval
Qiwei Tian
Chenhao Lin
Zhengyu Zhao
Qian Li
Shuai Liu
Chao Shen
AAML
MDE
227
1
0
09 Aug 2025
Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Xueguang Ma
Luyu Gao
Shengyao Zhuang
Jiaqi Samantha Zhan
Jamie Callan
Jimmy Lin
1.0K
18
0
05 May 2025
Detecting Content Rating Violations in Android Applications: A Vision-Language Approach
Dishanika Denipitiyage
B. Silva
Suranga Seneviratne
A. Seneviratne
Sanjay Chawla
256
0
0
07 Feb 2025
Optimized two-stage AI-based Neural Decoding for Enhanced Visual Stimulus Reconstruction from fMRI Data
Journal of Neural Engineering (J. Neural Eng.), 2024
Lorenzo Veronese
Andrea Moglia
Luca Mainardi
Pietro Cerveri
DiffM
355
1
0
17 Dec 2024
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Bingqing Zhang
Zhuo Cao
Heming Du
Xin Yu
Xue Li
Jiajun Liu
Sen Wang
VGen
283
7
0
30 Sep 2024
From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Shiwei Wu
Chao Zhang
Joya Chen
Tong Xu
Likang Wu
Yao Hu
Enhong Chen
209
2
0
12 Jun 2024
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
Han Fang
Xianghao Zang
Chao Ban
Zerun Feng
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
403
3
0
18 Apr 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
490
78
0
26 Mar 2024
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
312
4
0
25 Nov 2023
Videoprompter: an ensemble of foundational models for zero-shot video understanding
Adeel Yousaf
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
VLM
279
3
0
23 Oct 2023
Encoding and Decoding Narratives: Datafication and Alternative Access Models for Audiovisual Archives
ACM Multimedia (ACM MM), 2023
Yuchen Yang
215
1
0
10 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives
ACM Journal on Computing and Cultural Heritage (JOCCH), 2023
Yuchen Yang
VGen
229
9
0
09 Oct 2023
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu-Gang Jiang
CLIP
VLM
298
31
0
08 Oct 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
European Conference on Computer Vision (ECCV), 2023
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
377
34
0
07 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
International Conference on Learning Representations (ICLR), 2023
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
947
398
0
03 Oct 2023
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
Neural Information Processing Systems (NeurIPS), 2023
Hao Li
Marie-Jeanne Lesot
Lianli Gao
Xiaosu Zhu
Christophe Marsala
EDL
339
38
0
29 Sep 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Joey Tianyi Zhou
470
89
0
18 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
269
7
0
16 Sep 2023
Representation Learning for Sequential Volumetric Design Tasks
Md Ferdous Alam
Yi Wang
Linh Tran
Chin-Yi Cheng
Jieliang Luo
3DV
322
3
0
05 Sep 2023
Multi-event Video-Text Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Gengyuan Zhang
Jisen Ren
Jindong Gu
Volker Tresp
255
18
0
22 Aug 2023
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian
Willy Fitra Hendria
290
4
0
20 Jun 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
327
14
0
23 May 2023
i-Code Studio: A Configurable and Composable Framework for Integrative AI
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yuwei Fang
Mahmoud Khademi
Chenguang Zhu
Ziyi Yang
Reid Pryzant
...
Yao Qian
Takuya Yoshioka
Lu Yuan
Michael Zeng
Xuedong Huang
246
2
0
23 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
ACM Multimedia (ACM MM), 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
308
8
0
13 May 2023
Visual Reasoning: from State to Transformation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Xin Hong
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
244
4
0
02 May 2023
Verbs in Action: Improving verb understanding in video-language models
IEEE International Conference on Computer Vision (ICCV), 2023
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
549
91
0
13 Apr 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
248
18
0
12 Mar 2023
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval
ACM Transactions on Knowledge Discovery from Data (TKDD), 2023
Yansong Gong
Georgina Cosma
Axel Finke
ViT
363
4
0
13 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yizhen Chen
Jie Wang
Lijian Lin
Chen Ma
Jin Ma
Ying Shan
VLM
300
37
0
30 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Bo Fang
Wenhao Wu
Chang-rui Liu
Can Ma
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
380
97
0
16 Jan 2023
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
401
78
0
09 Dec 2022
Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval
Damianos Galanopoulos
Vasileios Mezaris
275
7
0
21 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
IEEE International Conference on Computer Vision (ICCV), 2022
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
431
39
0
17 Nov 2022
Boosting Video-Text Retrieval with Explicit High-Level Semantics
ACM Multimedia (ACM MM), 2022
Haoran Wang
Di Xu
Dongliang He
Fu Li
Zhong Ji
Jungong Han
Errui Ding
259
16
0
08 Aug 2022
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
ACM Multimedia (ACM MM), 2022
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Ming Yan
Ji Zhang
Rongrong Ji
CLIP
VLM
336
436
0
15 Jul 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
937
1,699
0
04 May 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
European Conference on Computer Vision (ECCV), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
191
49
0
26 Apr 2022
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations
IEEE Access (IEEE Access), 2022
Jie Jiang
Shaobo Min
Weijie Kong
Dihong Gong
Hongfa Wang
Zhifeng Li
Wei Liu
VLM
447
32
0
07 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
European Conference on Computer Vision (ECCV), 2022
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
456
57
0
06 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
International Conference on Learning Representations (ICLR), 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
790
715
0
01 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
European Conference on Computer Vision (ECCV), 2022
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
684
29
0
01 Apr 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
Computer Vision and Pattern Recognition (CVPR), 2022
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
Anthony L. Caterini
Animesh Garg
Guangwei Yu
395
243
0
28 Mar 2022
Disentangled Representation Learning for Text-Video Retrieval
Qiang Wang
Yanhao Zhang
Yun Zheng
Pan Pan
Xiansheng Hua
263
105
0
14 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
210
18
0
14 Mar 2022
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Neural Information Processing Systems (NeurIPS), 2022
Changdae Oh
Junhyuk So
Hoyoon Byun
Yongtaek Lim
Minchul Shin
Jong-June Jeon
Kyungwoo Song
523
43
0
08 Mar 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Computer Vision and Pattern Recognition (CVPR), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
396
126
0
13 Jan 2022
Multi-Query Video Retrieval
European Conference on Computer Vision (ECCV), 2022
Zeyu Wang
Yu Wu
Karthik Narasimhan
Olga Russakovsky
336
25
0
10 Jan 2022
1
2
Next
Page 1 of 2