Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
32
4
0
19 Jan 2024
Explicitly Disentangled Representations in Object-Centric Learning
Riccardo Majellaro
Jonathan Collu
Aske Plaat
Thomas M. Moerland
CoGe
OOD
OCL
76
1
0
18 Jan 2024
Decentralised Emergence of Robust and Adaptive Linguistic Conventions in Populations of Autonomous Agents Grounded in Continuous Worlds
Jérôme Botoko Ekila
Jens Nevens
Lara Verheyen
Katrien Beuls
Paul Van Eecke
12
3
0
16 Jan 2024
Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model
Taehee Kim
Yeongjae Cho
Heejun Shin
Yohan Jo
Dongmyung Shin
37
4
0
12 Jan 2024
Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation
Michael Free
Andrew Langworthy
Mary Dimitropoulaki
Simon Thompson
LLMAG
11
1
0
11 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments
Sean Kulinski
Nicholas R. Waytowich
James Z. Hare
David I. Inouye
28
3
0
09 Jan 2024
Deep Learning in Physical Layer: Review on Data Driven End-to-End Communication Systems and their Enabling Semantic Applications
Nazmul Islam
Seokjoo Shin
AI4CE
31
3
0
08 Jan 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
Yueqian Wang
Yuxuan Wang
Kai Chen
Dongyan Zhao
33
2
0
08 Jan 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
Fanqing Meng
Wenqi Shao
Quanfeng Lu
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
34
46
0
04 Jan 2024
Slot-guided Volumetric Object Radiance Fields
Di Qi
Tong Yang
Xiangyu Zhang
OCL
40
2
0
04 Jan 2024
Unsupervised Object-Centric Learning from Multiple Unspecified Viewpoints
Jinyang Yuan
Tonglin Chen
Zhimeng Shen
Bin Li
Xiangyang Xue
OCL
35
2
0
03 Jan 2024
LaViP:Language-Grounded Visual Prompts
Nilakshan Kunananthaseelan
Jing Zhang
Mehrtash Harandi
VLM
25
0
0
18 Dec 2023
Benchmarks for Physical Reasoning AI
Andrew Melnik
Robin Schiewer
Moritz Lange
Andrei Muresanu
Mozhgan Saeidi
Animesh Garg
Helge J. Ritter
31
8
0
17 Dec 2023
Advancing Surgical VQA with Scene Graph Knowledge
Kun Yuan
Manasi Kattel
Joël L. Lavanchy
Nassir Navab
V. Srivastav
N. Padoy
39
16
0
15 Dec 2023
Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang
Qizhe Zhang
Zijun Gao
Renrui Zhang
Ekaterina Shutova
Shiji Zhou
Shanghang Zhang
33
15
0
15 Dec 2023
See, Say, and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu
Giscard Biamby
David M. Chan
Lisa Dunlap
Ritwik Gupta
Xudong Wang
Joseph E. Gonzalez
Trevor Darrell
VLM
MLLM
44
18
0
13 Dec 2023
GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction
Jiacheng Ruan
Jingsheng Gao
Mingye Xie
Suncheng Xiang
Zefang Yu
Ting Liu
Yuzhuo Fu
MoE
56
4
0
12 Dec 2023
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single Images
Yafei Yang
Bo Yang
OCL
32
2
0
08 Dec 2023
PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation
Zhaoxi Chen
Fangzhou Hong
Haiyi Mei
Guangcong Wang
Lei Yang
Ziwei Liu
43
24
0
07 Dec 2023
MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning
Qizhe Zhang
Bocheng Zou
Ruichuan An
Jiaming Liu
Shanghang Zhang
MoE
29
2
0
05 Dec 2023
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai
Zirui Song
Dayan Guan
Zhenhao Chen
Xing Luo
Chenyu Yi
Alex C. Kot
MLLM
VLM
39
32
0
05 Dec 2023
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
Qihang Zhang
Yinghao Xu
Yujun Shen
Bo Dai
Bolei Zhou
Ceyuan Yang
29
4
0
04 Dec 2023
CLAMP: Contrastive LAnguage Model Prompt-tuning
Piotr Teterwak
Ximeng Sun
Bryan A. Plummer
Kate Saenko
Ser-Nam Lim
MLLM
VLM
40
1
0
04 Dec 2023
Learning Part Segmentation from Synthetic Animals
Jiawei Peng
Ju He
Prakhar Kaushik
Zihao Xiao
Jiteng Mu
Alan Yuille
26
1
0
30 Nov 2023
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung
Shu-Wei Lu
Yi-Hsuan Tsai
Yi-Ting Chen
37
6
0
29 Nov 2023
No Representation Rules Them All in Category Discovery
S. Vaze
Andrea Vedaldi
Andrew Zisserman
OOD
39
31
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
87
413
0
28 Nov 2023
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation
Christel Chappuis
Eliot Walt
Vincent Mendez
Sylvain Lobry
B. L. Saux
D. Tuia
33
4
0
28 Nov 2023
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
Zhihe Lu
Jiawang Bai
Xin Li
Zeyu Xiao
Xinchao Wang
VLM
52
11
0
28 Nov 2023
Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen
Mengchen Liu
Noel Codella
Yunsheng Li
Lu Yuan
Danna Gurari
49
5
0
27 Nov 2023
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
21
14
0
25 Nov 2023
Benchmarking Robustness of Text-Image Composed Retrieval
Shitong Sun
Jindong Gu
Shaogang Gong
CoGe
47
1
0
24 Nov 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
52
15
0
24 Nov 2023
Robot Learning in the Era of Foundation Models: A Survey
Xuan Xiao
Jiahang Liu
Zhipeng Wang
Yanmin Zhou
Yong Qi
Qian Cheng
Bin He
Shuo Jiang
AI4CE
LM&Ro
37
28
0
24 Nov 2023
VALUED -- Vision and Logical Understanding Evaluation Dataset
Soumadeep Saha
Saptarshi Saha
Utpal Garain
25
0
0
21 Nov 2023
What's left can't be right -- The remaining positional incompetence of contrastive vision-language models
Nils Hoehing
Ellen Rushe
Anthony Ventresque
VLM
28
3
0
20 Nov 2023
3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing
Haoran Li
Long Ma
Yong Liao
Lechao Cheng
Yanbin Hao
Pengyuan Zhou
32
4
0
18 Nov 2023
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Sai Saketh Rambhatla
Ishan Misra
EGVM
38
4
0
17 Nov 2023
Neural-Logic Human-Object Interaction Detection
Liulei Li
Jianan Wei
Wenguan Wang
Yi Yang
48
16
0
16 Nov 2023
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Fangzhi Xu
Zhiyong Wu
Qiushi Sun
Siyu Ren
Fei Yuan
Shuai Yuan
Qika Lin
Yu Qiao
Jun Liu
LLMAG
35
33
0
15 Nov 2023
Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models
Yeongbin Kim
Gautam Singh
Junyeong Park
Çağlar Gülçehre
Sungjin Ahn
OCL
VLM
50
1
0
15 Nov 2023
GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Serwan Jassim
Mario S. Holubar
Annika Richter
Cornelius Wolff
Xenia Ohmer
Elia Bruni
ELM
27
9
0
15 Nov 2023
Attribute Diversity Determines the Systematicity Gap in VQA
Ian Berlot-Attwell
Kumar Krishna Agrawal
A. M. Carrell
Yash Sharma
Naomi Saphra
33
1
0
15 Nov 2023
Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Calvin Luo
Boqing Gong
Ting Chen
Chen Sun
OCL
ObjD
32
1
0
10 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
38
24
0
09 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
47
2
0
08 Nov 2023
Training CLIP models on Data from Scientific Papers
Calvin Metzger
VLM
CLIP
27
1
0
08 Nov 2023
Object-Centric Learning with Slot Mixture Module
Daniil E. Kirilenko
Vitaliy Vorobyov
A. Kovalev
Aleksandr I. Panov
OCL
31
3
0
08 Nov 2023
Emergent Communication for Rules Reasoning
Yuxuan Guo
Yifan Hao
Rui Zhang
Enshuai Zhou
Zidong Du
...
Shaohui Peng
Di Huang
Rui Chen
Qi Guo
Yunji Chen
LLMAG
LRM
AI4CE
26
0
0
08 Nov 2023
Previous
1
2
3
...
6
7
8
...
28
29
30
Next