Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding

15 June 2023

Papers citing "Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding"

8 / 8 papers shown

Title
Decoupled Global-Local Alignment for Improving Compositional Understanding Xiaoxing Hu Kaicheng Yang J. Z. Wang Haoran Xu Ziyong Feng Y. Wang VLM 86 0 0 23 Apr 2025
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality Anuj Diwan Layne Berry Eunsol Choi David F. Harwath Kyle Mahowald CoGe 101 41 0 01 Nov 2022
Fine-grained Image Captioning with CLIP Reward Jaemin Cho Seunghyun Yoon Ajinkya Kale Franck Dernoncourt Trung Bui Mohit Bansal CLIP 121 76 0 26 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 388 4,110 0 28 Jan 2022
Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation Yao Qin Chiyuan Zhang Ting Chen Balaji Lakshminarayanan Alex Beutel Xuezhi Wang ViT 39 42 0 15 Oct 2021
Learning to Prompt for Vision-Language Models Kaiyang Zhou Jingkang Yang Chen Change Loy Ziwei Liu VPVLM CLIP VLM 322 2,249 0 02 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Chao Jia Yinfei Yang Ye Xia Yi-Ting Chen Zarana Parekh Hieu H. Pham Quoc V. Le Yun-hsuan Sung Zhen Li Tom Duerig VLM CLIP 293 3,683 0 11 Feb 2021
A weakly supervised adaptive triplet loss for deep metric learning Xiaonan Zhao Huan Qi R. Luo Larry S. Davis DML 27 24 0 27 Sep 2019