Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

31 October 2024

Declan Campbell

Tyler Giallanza

Nicolò De Sabbata

Steven M. Frankland

Thomas L. Griffiths

Jonathan D. Cohen

Taylor W. Webb

Papers citing "Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem"

11 / 11 papers shown

Title
VSC: Visual Search Compositional Text-to-Image Diffusion Model Do Huu Dat Nam Hyeonu Po Yuan Mao Tae-Hyun Oh DiffM CoGe 64 0 0 02 May 2025
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Xindi Wu Hee Seung Hwang Polina Kirichenko Olga Russakovsky VLM CoGe 68 0 0 30 Apr 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains J. A. Zhang Chuanqi Cheng Yong-Jin Liu Wei Liu Jian Luan Rui Yan 27 1 0 28 Apr 2025
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception Yuan-Hong Liao Sven Elflein Liu He Laura Leal-Taixe Yejin Choi Sanja Fidler David Acuna ReLM LRM VLM 129 0 0 21 Apr 2025
Evaluating Compositional Scene Understanding in Multimodal Generative Models Shuhao Fu Andrew Jun Lee Anna Wang Ida Momennejad Trevor Bihl Hongjing Lu Taylor Webb CoGe OCL 109 1 0 29 Mar 2025
MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX Liuyue Xie George Z. Wei Avik Kuthiala Ce Zheng Ananya Bal ... Rohan Choudhury Morteza Ziyadi Xu Zhang Hao Yang László A. Jeni 64 0 0 27 Mar 2025
LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions Yejin Kwon Daeun Moon Youngje Oh Hyunsoo Yoon 71 0 0 26 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis Alexander Ku Declan Campbell Xuechunzi Bai Jiayi Geng Ryan Liu ... Ilia Sucholutsky Veniamin Veselovsky Liyi Zhang Jian-Qiao Zhu Thomas L. Griffiths ELM 90 3 0 17 Mar 2025
Is CLIP ideal? No. Can we fix it? Yes! Raphi Kang Yue Song Georgia Gkioxari Pietro Perona VLM 61 0 0 10 Mar 2025
Vision-Language Models Struggle to Align Entities across Modalities Iñigo Alonso Ander Salaberria Gorka Azkune Jeremy Barnes Oier López de Lacalle VLM 61 0 0 05 Mar 2025
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning Xueqing Wu Yuheng Ding Bingxuan Li Pan Lu Da Yin Kai-Wei Chang Nanyun Peng LRM 105 3 0 03 Dec 2024