ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.00238
  4. Cited By
Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

31 October 2024
Declan Campbell
Sunayana Rane
Tyler Giallanza
Nicolò De Sabbata
Kia Ghods
Amogh Joshi
Alexander Ku
Steven M. Frankland
Thomas L. Griffiths
Jonathan D. Cohen
Taylor W. Webb
ArXivPDFHTML

Papers citing "Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem"

11 / 11 papers shown
Title
VSC: Visual Search Compositional Text-to-Image Diffusion Model
VSC: Visual Search Compositional Text-to-Image Diffusion Model
Do Huu Dat
Nam Hyeonu
Po Yuan Mao
Tae-Hyun Oh
DiffM
CoGe
64
0
0
02 May 2025
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Xindi Wu
Hee Seung Hwang
Polina Kirichenko
Olga Russakovsky
VLM
CoGe
68
0
0
30 Apr 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
J. A. Zhang
Chuanqi Cheng
Yong-Jin Liu
Wei Liu
Jian Luan
Rui Yan
27
1
0
28 Apr 2025
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
Yuan-Hong Liao
Sven Elflein
Liu He
Laura Leal-Taixe
Yejin Choi
Sanja Fidler
David Acuna
ReLM
LRM
VLM
129
0
0
21 Apr 2025
Evaluating Compositional Scene Understanding in Multimodal Generative Models
Evaluating Compositional Scene Understanding in Multimodal Generative Models
Shuhao Fu
Andrew Jun Lee
Anna Wang
Ida Momennejad
Trevor Bihl
Hongjing Lu
Taylor Webb
CoGe
OCL
109
1
0
29 Mar 2025
MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX
MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX
Liuyue Xie
George Z. Wei
Avik Kuthiala
Ce Zheng
Ananya Bal
...
Rohan Choudhury
Morteza Ziyadi
Xu Zhang
Hao Yang
László A. Jeni
64
0
0
27 Mar 2025
LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
Yejin Kwon
Daeun Moon
Youngje Oh
Hyunsoo Yoon
71
0
0
26 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
90
3
0
17 Mar 2025
Is CLIP ideal? No. Can we fix it? Yes!
Raphi Kang
Yue Song
Georgia Gkioxari
Pietro Perona
VLM
61
0
0
10 Mar 2025
Vision-Language Models Struggle to Align Entities across Modalities
Iñigo Alonso
Ander Salaberria
Gorka Azkune
Jeremy Barnes
Oier López de Lacalle
VLM
61
0
0
05 Mar 2025
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu
Yuheng Ding
Bingxuan Li
Pan Lu
Da Yin
Kai-Wei Chang
Nanyun Peng
LRM
105
3
0
03 Dec 2024
1