ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.08832
  4. Cited By
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to
  Enhance Visio-Linguistic Compositional Understanding

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding

15 June 2023
Le Zhang
Rabiul Awal
Aishwarya Agrawal
    CoGe
    VLM
ArXivPDFHTML

Papers citing "Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding"

8 / 8 papers shown
Title
Decoupled Global-Local Alignment for Improving Compositional Understanding
Decoupled Global-Local Alignment for Improving Compositional Understanding
Xiaoxing Hu
Kaicheng Yang
J. Z. Wang
Haoran Xu
Ziyong Feng
Y. Wang
VLM
86
0
0
23 Apr 2025
Why is Winoground Hard? Investigating Failures in Visuolinguistic
  Compositionality
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
Anuj Diwan
Layne Berry
Eunsol Choi
David F. Harwath
Kyle Mahowald
CoGe
101
41
0
01 Nov 2022
Fine-grained Image Captioning with CLIP Reward
Fine-grained Image Captioning with CLIP Reward
Jaemin Cho
Seunghyun Yoon
Ajinkya Kale
Franck Dernoncourt
Trung Bui
Mohit Bansal
CLIP
121
76
0
26 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,110
0
28 Jan 2022
Understanding and Improving Robustness of Vision Transformers through
  Patch-based Negative Augmentation
Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation
Yao Qin
Chiyuan Zhang
Ting Chen
Balaji Lakshminarayanan
Alex Beutel
Xuezhi Wang
ViT
39
42
0
15 Oct 2021
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,249
0
02 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
A weakly supervised adaptive triplet loss for deep metric learning
A weakly supervised adaptive triplet loss for deep metric learning
Xiaonan Zhao
Huan Qi
R. Luo
Larry S. Davis
DML
27
24
0
27 Sep 2019
1