Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.07783
Cited By
FILIP: Fine-grained Interactive Language-Image Pre-Training
9 November 2021
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FILIP: Fine-grained Interactive Language-Image Pre-Training"
50 / 86 papers shown
Title
HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights
Ozan Gokdemir
Carlo Siebenschuh
Alexander Brace
Azton Wells
Brian Hsu
...
A. Anandkumar
Ian Foster
R. Stevens
V. Vishwanath
A. Ramanathan
VLM
30
0
0
07 May 2025
Decoupled Global-Local Alignment for Improving Compositional Understanding
Xiaoxing Hu
Kaicheng Yang
J. Z. Wang
Haoran Xu
Ziyong Feng
Y. Wang
VLM
62
0
0
23 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
39
0
0
10 Apr 2025
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
Zehong Ma
Hao Chen
Wei Zeng
Limin Su
Shiliang Zhang
AI4TS
32
0
0
10 Apr 2025
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
Ans Munir
Faisal Z. Qureshi
M. H. Khan
Mohsen Ali
VLM
70
0
0
15 Mar 2025
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu
Gaopeng Gou
Jiamin Zhuang
Jing Yu
Kun Song
Qihao Wang
Yili Li
Gang Xiong
VLM
75
0
0
13 Mar 2025
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
VLM
OffRL
60
0
0
11 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&Ro
VLM
74
5
0
05 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
67
0
0
03 Mar 2025
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
Benjamin Schneider
Florian Kerschbaum
Wenhu Chen
58
0
0
01 Mar 2025
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
A. Dhakal
S. Sastry
Subash Khanal
Adeel Ahmad
Eric Xing
Nathan Jacobs
48
0
0
27 Feb 2025
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
Shuchang Zhou
Jiwei Wei
Shiyuan He
Yuyang Zhou
Chaoning Zhang
Jie Zou
Ning Xie
Yang Yang
VLM
VPVLM
81
0
0
27 Feb 2025
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
70
0
0
02 Dec 2024
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
Mohamed Fazli Mohamed Imam
Rufael Fedaku Marew
Jameel Hassan
M. Fiaz
Alham Fikri Aji
Hisham Cholakkal
VLM
88
0
0
28 Nov 2024
GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection
Jiyul Ham
Yonggon Jung
Jun-Geol Baek
VLM
26
1
0
09 Nov 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
23
0
0
26 Oct 2024
LatentBKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty
Joey Wilson
Ruihan Xu
Yile Sun
Parker Ewen
Minghan Zhu
Kira Barton
Maani Ghaffari
36
0
0
15 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
28
0
0
01 Oct 2024
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
25
0
0
12 Sep 2024
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
40
9
0
29 Aug 2024
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
Hunmin Yang
Jongoh Jeong
Kuk-Jin Yoon
AAML
VLM
52
4
0
30 Jul 2024
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
Yunyi Xuan
Weijie Chen
Shicai Yang
Di Xie
Luojun Lin
Yueting Zhuang
VLM
20
4
0
21 Jul 2024
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
57
21
0
27 Jun 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLM
ObjD
70
1
0
21 Jun 2024
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
35
1
0
11 Jun 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
64
41
0
23 May 2024
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
Maxime Zanella
Ismail Ben Ayed
VLM
MLLM
35
22
0
03 May 2024
RankCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zhili Feng
Zenghui Ding
Yining Sun
SSL
VLM
40
7
0
15 Apr 2024
Training-Free Semantic Segmentation via LLM-Supervision
Wenfang Sun
Yingjun Du
Gaowen Liu
Ramana Rao Kompella
Cees G. M. Snoek
VLM
35
2
0
31 Mar 2024
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li
Ying Chen
Yifei Chen
Wenxian Yang
Bowen Ding
Yuchen Han
Liansheng Wang
Rongshan Yu
31
15
0
29 Feb 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
26
12
0
15 Feb 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
49
4
0
28 Dec 2023
Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation
Zixian Guo
Yuxiang Wei
Ming-Yu Liu
Zhilong Ji
Jinfeng Bai
Yiwen Guo
Wangmeng Zuo
VLM
24
8
0
26 Dec 2023
Leveraging Habitat Information for Fine-grained Bird Identification
Tin Nguyen
Anh Nguyen
Anh Nguyen
VLM
28
0
0
22 Dec 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
11
4
0
14 Dec 2023
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
Xuying Zhang
Bo-Wen Yin
Yuming Chen
Zheng Lin
Yunheng Li
Qibin Hou
Ming-Ming Cheng
CLIP
DiffM
34
7
0
07 Dec 2023
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jia-li Zuo
Hanyu Zhou
Ying Nie
Feng Zhang
Tianyu Guo
Nong Sang
Yunhe Wang
Changxin Gao
25
17
0
06 Dec 2023
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search
Hefeng Wu
Weifeng Chen
Zhibin Liu
Tianshui Chen
Zhiguang Chen
Liang Lin
17
11
0
15 Nov 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
17
32
0
20 Oct 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
Chengyang Zhao
Yikang Shen
Zhenfang Chen
Mingyu Ding
Chuang Gan
40
15
0
10 Oct 2023
FLIP: Cross-domain Face Anti-spoofing with Language Guidance
K. Srivatsan
Muzammal Naseer
Karthik Nandakumar
CVBM
42
43
0
28 Sep 2023
Self-supervised Cross-view Representation Reconstruction for Change Captioning
Yunbin Tu
Liang Li
Filippos Christianos
Zheng-Jun Zha
Zhibin Li
Qingming Huang
SSL
22
24
0
28 Sep 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCL
VLM
14
3
0
26 Sep 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
43
9
0
23 Aug 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
19
3
0
22 Aug 2023
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Baoshuo Kan
Teng Wang
Wenpeng Lu
Xiantong Zhen
Weili Guan
Feng Zheng
VPVLM
VLM
16
25
0
22 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
116
0
25 Jul 2023
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Pujin Cheng
Li Lin
Junyan Lyu
Yijin Huang
Wenhan Luo
Xiaoying Tang
MedIm
8
43
0
24 Jul 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
11
38
0
21 Jun 2023
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
14
0
0
31 May 2023
1
2
Next