Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.03689
Cited By
COLA: A Benchmark for Compositional Text-to-image Retrieval
5 May 2023
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"COLA: A Benchmark for Compositional Text-to-image Retrieval"
30 / 30 papers shown
Title
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shinýa Yamaguchi
Dewei Feng
Sekitoshi Kanai
Kazuki Adachi
Daiki Chijiwa
VLM
34
0
0
17 Apr 2025
TMCIR: Token Merge Benefits Composed Image Retrieval
Chaoyang Wang
Zeyu Zhang
Long Teng
Zijun Li
Shichao Kan
16
0
0
15 Apr 2025
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Samarth Mishra
Kate Saenko
Venkatesh Saligrama
CoGe
LRM
32
0
0
07 Apr 2025
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chunbai Zhang
Chao Wang
Yang Zhou
Yan Peng
LRM
ReLM
48
0
0
02 Feb 2025
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
Éloi Zablocki
Valentin Gerard
Amaia Cardiel
Eric Gaussier
Matthieu Cord
Eduardo Valle
66
0
0
23 Nov 2024
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Maitreya Patel
Abhiram Kusumba
Sheng Cheng
Changhoon Kim
Tejas Gokhale
Chitta Baral
Yezhou Yang
CoGe
CLIP
42
7
0
04 Nov 2024
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
Edward Vendrow
Omiros Pantazis
Alexander Shepard
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
Sara Beery
Grant Van Horn
VLM
31
0
0
04 Nov 2024
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
Youngtaek Oh
Jae-Won Cho
Dong-Jin Kim
In So Kweon
Junmo Kim
VLM
CoGe
CLIP
11
4
0
07 Oct 2024
EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections
Francesc Net
Lluís Gómez
16
0
0
02 Oct 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
36
3
0
23 Sep 2024
GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models
Zixuan Wu
Yoolim Kim
Carolyn Jane Anderson
VLM
16
0
0
12 Aug 2024
The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication
Tom Kouwenhoven
Max Peeperkorn
Bram van Dijk
Tessa Verhoef
19
3
0
25 Jul 2024
Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
Reza Abbasi
M. Rohban
M. Baghshah
CoGe
32
5
0
08 Jul 2024
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
Zixu Cheng
Yujiang Pu
Shaogang Gong
Parisa Kordjamshidi
Yu Kong
AI4TS
17
0
0
06 Jul 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVM
VGen
66
1
0
26 Jun 2024
MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
Shuo Xu
Sai Wang
Xinyue Hu
Yutian Lin
Bo Du
Yu Wu
CoGe
41
0
0
18 Jun 2024
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
Imanol Miranda
Ander Salaberria
Eneko Agirre
Gorka Azkune
CoGe
28
0
0
14 Jun 2024
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
38
4
0
13 Jun 2024
Position: Do Not Explain Vision Models Without Context
Paulina Tomaszewska
Przemysław Biecek
16
1
0
28 Apr 2024
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani
Bac Nguyen
Samuel Lavoie
Ranjay Krishna
Aaron Courville
21
1
0
24 Apr 2024
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua
Jing Shi
Kushal Kafle
Simon Jenni
Daoan Zhang
John Collomosse
Scott D. Cohen
Jiebo Luo
CoGe
VLM
39
9
0
23 Apr 2024
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Tianyu Zhu
M. Jung
Jesse Clark
83
1
0
12 Apr 2024
Do Vision-Language Models Understand Compound Nouns?
Sonal Kumar
Sreyan Ghosh
S. Sakshi
Utkarsh Tyagi
Dinesh Manocha
CLIP
CoGe
VLM
56
0
0
30 Mar 2024
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan
Jaemin Cho
Elias Stengel-Eskin
Mohit Bansal
VLM
ObjD
46
29
0
04 Mar 2024
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models
Santiago Castro
Amir Ziai
Avneesh Saluja
Zhuoning Yuan
Rada Mihalcea
MLLM
CoGe
VLM
20
5
0
22 Feb 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
21
3
0
15 Jan 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
44
4
0
28 Dec 2023
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
319
2,108
0
02 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
1