Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
L. V. D. van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin
Yupeng Zheng
Pengfei Li
Weize Li
Yuhang Zheng
...
Kun Zhan
Peng Jia
Xiaoxiao Long
Yilun Chen
Hao Zhao
3DV
79
16
0
28 Mar 2024
Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering
Pascal Tilli
Ngoc Thang Vu
33
1
0
26 Mar 2024
On permutation-invariant neural networks
Masanari Kimura
Ryotaro Shimizu
Yuki Hirakawa
Ryosuke Goto
Yuki Saito
OOD
AAML
41
12
0
26 Mar 2024
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang
Yasi Zhang
Zhiyuan Fang
Yingnian Wu
Yonatan Bisk
Feng Gao
EGVM
47
6
0
25 Mar 2024
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li
Bhavan A. Jasani
Peng Tang
Shabnam Ghadar
LRM
39
8
0
25 Mar 2024
iDAT: inverse Distillation Adapter-Tuning
Jiacheng Ruan
Jingsheng Gao
Mingye Xie
Daize Dong
Suncheng Xiang
Ting Liu
Yuzhuo Fu
54
1
0
23 Mar 2024
Grounding Spatial Relations in Text-Only Language Models
Gorka Azkune
Ander Salaberria
Eneko Agirre
42
0
0
20 Mar 2024
PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Yew Ken Chia
Vernon Toh Yan Han
Deepanway Ghosal
Lidong Bing
Soujanya Poria
LRM
ReLM
49
13
0
20 Mar 2024
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Yongshuo Zong
Ondrej Bohdal
Timothy M. Hospedales
30
5
0
19 Mar 2024
SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors
Chenyang Ma
Kai Lu
Ta-Ying Cheng
Niki Trigoni
Andrew Markham
LRM
40
7
0
18 Mar 2024
Language Evolution with Deep Learning
Mathieu Rita
Paul Michel
Rahma Chaabouni
Olivier Pietquin
Emmanuel Dupoux
Florian Strub
34
2
0
18 Mar 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
ObjD
44
12
0
14 Mar 2024
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong
Hui Chen
Tianxiang Hao
Zijia Lin
Jungong Han
Yuesong Zhang
Guoxin Wang
Yongjun Bao
Guiguang Ding
51
17
0
14 Mar 2024
From Pixel to Cancer: Cellular Automata in Computed Tomography
Yuxiang Lai
Xiaoxi Chen
Angtian Wang
Alan Yuille
Zongwei Zhou
MedIm
56
12
0
11 Mar 2024
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Joseph Cho
Fachrina Dewi Puspitasari
Sheng Zheng
Jingyao Zheng
Lik-Hang Lee
Tae-Ho Kim
Choong Seon Hong
Chaoning Zhang
EGVM
VGen
44
41
0
08 Mar 2024
Efficient Data Collection for Robotic Manipulation via Compositional Generalization
Jensen Gao
Annie Xie
Ted Xiao
Chelsea Finn
Dorsa Sadigh
29
19
0
08 Mar 2024
How Far Are We from Intelligent Visual Deductive Reasoning?
Yizhe Zhang
Richard He Bai
Ruixiang Zhang
Jiatao Gu
Shuangfei Zhai
J. Susskind
Navdeep Jaitly
ReLM
LRM
52
13
0
07 Mar 2024
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes
H. Malik
Muhammad Huzaifa
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
DiffM
45
2
0
07 Mar 2024
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai
Hang Chen
Jun Du
Ruoyu Wang
Shihao Chen
Jie Ma
Haotian Wang
Chin-Hui Lee
45
4
0
07 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
55
3
0
07 Mar 2024
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
S. S. Mondal
Jonathan D. Cohen
Taylor W. Webb
OCL
40
9
0
06 Mar 2024
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments
Savitha Sam Abraham
Marjan Alirezaie
Luc de Raedt
25
1
0
05 Mar 2024
Triple-CFN: Restructuring Concept and Feature Spaces for Enhancing Abstract Reasoning Process
Ruizhuo Song
Beiming Yuan
LRM
43
0
0
05 Mar 2024
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Jianjian Cao
Peng Ye
Shengze Li
Chong Yu
Yansong Tang
Jiwen Lu
Tao Chen
38
16
0
05 Mar 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang
Yiming Ren
Hao Luo
Tiantong Li
Chenxiang Yan
...
Qingyun Li
Lewei Lu
Xizhou Zhu
Yu Qiao
Jifeng Dai
MLLM
52
47
0
29 Feb 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
Alexander Black
Jing Shi
Yifei Fai
Tu Bui
John Collomosse
52
5
0
29 Feb 2024
The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns
Luca Salvatore Lorello
Marco Lippi
S. Melacci
37
3
0
27 Feb 2024
Measuring Vision-Language STEM Skills of Neural Models
Jianhao Shen
Ye Yuan
Srbuhi Mirzoyan
Ming Zhang
Chenguang Wang
VLM
33
8
0
27 Feb 2024
DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer
Yizhe Wu
Haitz Sáez de Ocáriz Borde
Jack Collins
Oiwi Parker Jones
Ingmar Posner
3DPC
OCL
38
2
0
26 Feb 2024
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models
Santiago Castro
Amir Ziai
Avneesh Saluja
Zhuoning Yuan
Rada Mihalcea
MLLM
CoGe
VLM
39
5
0
22 Feb 2024
Vision-Language Navigation with Embodied Intelligence: A Survey
Peng Gao
Peng Wang
Feng Gao
Fei Wang
Ruyue Yuan
LM&Ro
43
2
0
22 Feb 2024
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLM
OCL
56
7
0
22 Feb 2024
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog
Adnen Abdessaied
Manuel von Hochmeister
Andreas Bulling
40
2
0
20 Feb 2024
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
Xuanyu Lei
Zonghan Yang
Xinrui Chen
Peng Li
Yang Liu
MLLM
LRM
40
31
0
19 Feb 2024
Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review
Thang-Anh-Quan Nguyen
Amine Bourki
Mátyás Macudzinski
Anthony Brunel
M. Bennamoun
43
11
0
17 Feb 2024
Pix2Code: Learning to Compose Neural Visual Concepts as Programs
Antonia Wüst
Wolfgang Stammer
Quentin Delfosse
Devendra Singh Dhami
Kristian Kersting
49
13
0
13 Feb 2024
Unsupervised Discovery of Object-Centric Neural Fields
Rundong Luo
Hong-Xing Yu
Jiajun Wu
3DPC
OCL
90
3
0
12 Feb 2024
Where is the Truth? The Risk of Getting Confounded in a Continual World
Florian Peter Busch
Roshni Kamath
Rupert Mitchell
Wolfgang Stammer
Kristian Kersting
Martin Mundt
CML
CLL
34
4
0
09 Feb 2024
AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems
Clara Punzi
Roberto Pellungrini
Mattia Setzu
F. Giannotti
D. Pedreschi
25
5
0
09 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
109
0
08 Feb 2024
Revisiting the Power of Prompt for Visual Tuning
Yuzhu Wang
Lechao Cheng
Chaowei Fang
Dingwen Zhang
Manni Duan
Meng Wang
VLM
56
14
0
04 Feb 2024
GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning
Yanbin Wei
Shuai Fu
Weisen Jiang
Zejian Zhang
Zhixiong Zeng
Qi Wu
James T. Kwok
Yu Zhang
35
12
0
03 Feb 2024
Neural Language of Thought Models
Yi-Fu Wu
Minseung Lee
Sungjin Ahn
MLLM
VLM
80
6
0
02 Feb 2024
Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations
Bhishma Dedhia
N. Jha
OCL
54
1
0
02 Feb 2024
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
Jianing Li
Xi Nan
Ming Lu
Li Du
Shanghang Zhang
50
1
0
31 Jan 2024
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao
Yilin Zhang
Rong Xiang
Jing Li
Hillming Li
43
16
0
29 Jan 2024
On the generalization capacity of neural networks during generic multimodal reasoning
Takuya Ito
Soham Dan
Mattia Rigotti
James Kozloski
Murray Campbell
LRM
40
2
0
26 Jan 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas J. Guibas
Fei Xia
LRM
ReLM
52
211
0
22 Jan 2024
Text-to-Image Cross-Modal Generation: A Systematic Review
Maciej Żelaszczyk
Jacek Mańdziuk
35
3
0
21 Jan 2024
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
24
2
0
19 Jan 2024
Previous
1
2
3
...
5
6
7
...
28
29
30
Next