Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
L. V. D. van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
On the Role of Visual Grounding in VQA
Daniel Reich
Tanja Schultz
21
1
0
26 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
48
282
0
24 Jun 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
33
1
0
24 Jun 2024
UQE: A Query Engine for Unstructured Databases
Hanjun Dai
B. Wang
Xingchen Wan
Bo Dai
Sherry Yang
Azade Nova
Pengcheng Yin
P. Phothilimthana
Charles Sutton
Dale Schuurmans
60
3
0
23 Jun 2024
IWISDM: Assessing instruction following in multimodal models at scale
Xiaoxuan Lei
Lucas Gomez
Hao Yuan Bai
P. Bashivan
VLM
33
1
0
20 Jun 2024
Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning
Zhuohang Jiang
Bingkui Tong
Xia Du
Ahmed Alhammadi
Jizhe Zhou
58
1
0
18 Jun 2024
Neural Concept Binder
Wolfgang Stammer
Antonia Wüst
David Steinmann
Kristian Kersting
OCL
39
4
0
14 Jun 2024
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Shimin Chen
Yitian Yuan
Shaoxiang Chen
Zequn Jie
Lin Ma
VLM
35
3
0
12 Jun 2024
A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
Riccardo Fogliato
Pratik Patil
Mathew Monfort
Pietro Perona
24
1
0
11 Jun 2024
Unsupervised Object Detection with Theoretical Guarantees
Marian Longa
Joao F. Henriques
45
0
0
11 Jun 2024
Let Go of Your Labels with Unsupervised Transfer
Artyom Gadetsky
Yulun Jiang
Maria Brbić
VLM
45
6
0
11 Jun 2024
Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention
Avinash Kori
Francesco Locatello
Ainkaran Santhirasekaram
Francesca Toni
Ben Glocker
Fabio De Sousa Ribeiro
OCL
47
1
0
11 Jun 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
VLM
CLIP
40
14
0
11 Jun 2024
Adapters Strike Back
Jan-Martin O. Steitz
Stefan Roth
35
5
0
10 Jun 2024
On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions
Denys Pushkin
Raphael Berthier
Emmanuel Abbe
32
0
0
10 Jun 2024
Integrating Text and Image Pre-training for Multi-modal Algorithmic Reasoning
Zijian Zhang
Wei Liu
37
0
0
08 Jun 2024
LogiCode: an LLM-Driven Framework for Logical Anomaly Detection
Yiheng Zhang
Yunkang Cao
Xiaohao Xu
Weiming Shen
42
14
0
07 Jun 2024
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
An-Chieh Cheng
Hongxu Yin
Yang Fu
Qiushan Guo
Ruihan Yang
Jan Kautz
Xiaolong Wang
Sifei Liu
LRM
61
48
0
03 Jun 2024
A Synergistic Approach In Network Intrusion Detection By Neurosymbolic AI
Alice Bizzarri
Chung-En Yu
B. Jalaeian
Fabrizio Riguzzi
Nathaniel D. Bastian
AAML
29
2
0
03 Jun 2024
WebSuite: Systematically Evaluating Why Web Agents Fail
Eric Li
Jim Waldo
LLMAG
31
5
0
01 Jun 2024
Going beyond Compositions, DDPMs Can Produce Zero-Shot Interpolations
Justin Deschenaux
Igor Krawczuk
Grigorios G. Chrysos
V. Cevher
DiffM
57
3
0
29 May 2024
Understanding Inter-Concept Relationships in Concept-Based Models
Naveen Raman
M. Zarlenga
M. Jamnik
38
4
0
28 May 2024
DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture
Shentong Mo
Sukmin Yun
45
3
0
28 May 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
67
9
0
27 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
Anton Van Den Hengel
VLM
45
1
0
27 May 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
37
3
0
26 May 2024
When does compositional structure yield compositional generalization? A kernel theory
Samuel Lippl
Kim Stachenfeld
NAI
CoGe
73
6
0
26 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
81
3
0
24 May 2024
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Mohammed Baharoon
Jonathan Klein
D. L. Michels
SSL
VLM
44
0
0
23 May 2024
Learning Object-Centric Representation via Reverse Hierarchy Guidance
Junhong Zou
Xiangyu Zhu
Zhaoxiang Zhang
Zhen Lei
BDL
ObjD
OCL
26
0
0
17 May 2024
Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling
Guangmin Zheng
Jin Wang
Xiaobing Zhou
Xuejie Zhang
LRM
38
2
0
16 May 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRM
RALM
37
16
0
15 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B Tenenbaum
Chuang Gan
43
178
0
15 May 2024
Controllable Image Generation With Composed Parallel Token Prediction
Jamie Stirling
Noura Al-Moubayed
33
0
0
10 May 2024
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
43
157
0
03 May 2024
IID Relaxation by Logical Expressivity: A Research Agenda for Fitting Logics to Neurosymbolic Requirements
M. Stol
Alessandra Mileo
34
1
0
30 Apr 2024
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Huy Quang Pham
Thang Kien-Bao Nguyen
Quan Van Nguyen
Dan Quang Tran
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
41
3
0
29 Apr 2024
Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images
Hongyu Yan
Yadong Mu
3DV
36
0
0
25 Apr 2024
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani
Bac Nguyen
Samuel Lavoie
Ranjay Krishna
Aaron Courville
39
1
0
24 Apr 2024
Re-Thinking Inverse Graphics With Large Language Models
Peter Kulits
Haiwen Feng
Weiyang Liu
Victoria Fernandez-Abrevaya
Michael J. Black
AI4CE
35
7
0
23 Apr 2024
Closed Loop Interactive Embodied Reasoning for Robot Manipulation
Michal Nazarczuk
Jan Kristof Behrens
Karla Stepanova
Matej Hoffmann
K. Mikolajczyk
LM&Ro
LRM
55
1
0
23 Apr 2024
Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representations
Xiao Zhang
Gosse Bouma
Johan Bos
NAI
33
0
0
19 Apr 2024
Sequential Compositional Generalization in Multimodal Models
Semih Yagcioglu
Osman Batur .Ince
Aykut Erdem
Erkut Erdem
Desmond Elliott
Deniz Yuret
46
1
0
18 Apr 2024
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang
Yinpeng Dong
Siyuan Zhang
Tianzan Min
Hang Su
Jun Zhu
LRM
VLM
52
5
0
17 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
44
10
0
12 Apr 2024
A Survey on the Integration of Generative AI for Critical Thinking in Mobile Networks
Athanasios Karapantelakis
Alexandros Nikou
Ajay Kattepur
Jean Martins
Leonid Mokrushin
S. Mohalik
Marin Orlic
Aneta Vulgarakis Feljan
29
1
0
10 Apr 2024
GUIDE: Graphical User Interface Data for Execution
Rajat Chawla
Adarsh Jha
Muskaan Kumar
NS Mukunda
Ishaan Bhola
LLMAG
29
3
0
09 Apr 2024
iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection
Nan Zhou
Jiaxin Chen
Di Huang
35
1
0
08 Apr 2024
What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases
A. M. H. Tiong
Junqi Zhao
Boyang Albert Li
Junnan Li
Guosheng Lin
Caiming Xiong
45
8
0
03 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
3DV
VLM
47
25
0
02 Apr 2024
Previous
1
2
3
4
5
6
...
28
29
30
Next