CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

20 December 2016

Justin Johnson

B. Hariharan

Laurens van der Maaten

Li Fei-Fei

Papers citing "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"

50 / 1,475 papers shown

Title
EgoTaskQA: Understanding Human Tasks in Egocentric Videos Baoxiong Jia Ting Lei Song-Chun Zhu Siyuan Huang EgoV 37 61 0 08 Oct 2022
Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images Yafei Yang Bo Yang OCL 111 17 0 05 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data Ye Zhu Yuehua Wu N. Sebe Yan Yan 40 16 0 05 Oct 2022
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank Q. Garrido Randall Balestriero Laurent Najman Yann LeCun SSL 68 74 0 05 Oct 2022
Differentiable Mathematical Programming for Object-Centric Representation Learning Adeel Pervez Phillip Lippe E. Gavves OCL 49 5 0 05 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning Xu Yang Hanwang Zhang Chongyang Gao Jianfei Cai MLLM 45 10 0 04 Oct 2022
Extending Compositional Attention Networks for Social Reasoning in Videos Christina Sartzetaki Georgios Paraskevopoulos Alexandros Potamianos LRM 31 3 0 03 Oct 2022
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Georgios Tziafas Hamidreza Kasaei LM&Ro 20 3 0 03 Oct 2022
Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation Xinhang Liu Jiaben Chen Huai Yu Yu-Wing Tai Chi-Keung Tang 95 28 0 02 Oct 2022
Multimodal Analogical Reasoning over Knowledge Graphs Ningyu Zhang Lei Li Xiang Chen Xiaozhuan Liang Shumin Deng Huajun Chen 62 26 0 01 Oct 2022
Compositional Semantic Parsing with Large Language Models Andrew Drozdov Nathanael Scharli Ekin Akyuurek Nathan Scales Xinying Song Xinyun Chen Olivier Bousquet Denny Zhou ReLM LRM 208 92 0 29 Sep 2022
A Multiagent Framework for the Asynchronous and Collaborative Extension of Multitask ML Systems Andrea Gesmundo 29 2 0 29 Sep 2022
On the visual analytic intelligence of neural networks Stanislaw Wo'zniak Hlynur Jónsson G. Cherubini A. Pantazi E. Eleftheriou 25 0 0 28 Sep 2022
Towards Faithful Model Explanation in NLP: A Survey Qing Lyu Marianna Apidianaki Chris Callison-Burch XAI 120 110 0 22 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering Hao Li Jinfa Huang Peng Jin Guoli Song Qi Wu Jie Chen 44 21 0 21 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering Pan Lu Swaroop Mishra Tony Xia Liang Qiu Kai-Wei Chang Song-Chun Zhu Oyvind Tafjord Peter Clark Ashwin Kalyan ELM ReLM LRM 211 1,134 0 20 Sep 2022
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems Andrea Gesmundo 21 18 0 15 Sep 2022
The Embeddings World and Artificial General Intelligence M. H. Chehreghani 19 1 0 14 Sep 2022
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation A. Maharana Darryl Hannan Joey Tianyi Zhou DiffM 39 78 0 13 Sep 2022
MaXM: Towards Multilingual Visual Question Answering Soravit Changpinyo Linting Xue Michal Yarom Ashish V. Thapliyal Idan Szpektor J. Amelot Xi Chen Radu Soricut 33 8 0 12 Sep 2022
Ask Before You Act: Generalising to Novel Environments by Asking Questions Ross Murphy S. Mosesov Javier Leguina Peral Thymo ter Doest LRM 32 0 0 10 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions Paul Pu Liang Amir Zadeh Louis-Philippe Morency 18 63 0 07 Sep 2022
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit G. Sejnova M. Vavrecka Karla Stepanova VGen 28 0 0 07 Sep 2022
Trust in Language Grounding: a new AI challenge for human-robot teams David M. Bossens C. Evers 42 1 0 05 Sep 2022
Injecting Image Details into CLIP's Feature Space Zilun Zhang Cuifeng Shen Yuan-Chung Shen Huixin Xiong Xinyu Zhou VLM CLIP 32 0 0 31 Aug 2022
Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++ Barath Mohan Umapathi Kushal Chauhan Pradeep Shenoy D. Sridharan 37 0 0 29 Aug 2022
LogicRank: Logic Induced Reranking for Generative Text-to-Image Systems Bjorn Deiseroth P. Schramowski Hikaru Shindo Devendra Singh Dhami Kristian Kersting EGVM DiffM 24 1 0 29 Aug 2022
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task Stan Weixian Lei Difei Gao Jay Zhangjie Wu Yuxuan Wang Wei Liu Meng Zhang Mike Zheng Shou 25 35 0 24 Aug 2022
Neuro-Symbolic Visual Dialog Adnen Abdessaied Mihai Bâce Andreas Bulling NAI 21 3 0 22 Aug 2022
ILLUME: Rationalizing Vision-Language Models through Human Interactions Manuel Brack P. Schramowski Bjorn Deiseroth Kristian Kersting VLM MLLM 27 3 0 17 Aug 2022
Patching open-vocabulary models by interpolating weights Gabriel Ilharco Mitchell Wortsman S. Gadre Shuran Song Hannaneh Hajishirzi Simon Kornblith Ali Farhadi Ludwig Schmidt VLM KELM 37 169 0 10 Aug 2022
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning Adam Dahlgren Lindström Savitha Sam Abraham 19 50 0 10 Aug 2022
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding Bingning Wang Feiya Lv Ting Yao Yiming Yuan Jin Ma Yu Luo Haijin Liang 31 3 0 05 Aug 2022
Generative Bias for Robust Visual Question Answering Jae-Won Cho Dong-Jin Kim H. Ryu In So Kweon OOD CML 41 19 0 01 Aug 2022
Testing Relational Understanding in Text-Guided Image Generation C. Conwell T. Ullman EGVM 160 65 0 29 Jul 2022
DoRO: Disambiguation of referred object for embodied agents Pradip Pramanick Chayan Sarkar S. Paul R. Roychoudhury Brojeshwar Bhowmick LM&Ro 20 14 0 28 Jul 2022
Unit Testing for Concepts in Neural Networks Charles Lovering Ellie Pavlick 25 28 0 28 Jul 2022
Break and Make: Interactive Structural Understanding Using LEGO Bricks Aaron Walsman Muru Zhang Klemen Kotar Karthik Desingh Ali Farhadi Dieter Fox 40 10 0 27 Jul 2022
Neural Groundplans: Persistent Neural Scene Representations from a Single Image Prafull Sharma A. Tewari Yilun Du Sergey Zakharov Rares Andrei Ambrus Adrien Gaidon William T. Freeman F. Durand J. Tenenbaum Vincent Sitzmann SSL OCL 29 16 0 22 Jul 2022
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks Andrew M. Saxe Shagun Sodhani Sam Lewallen AI4CE 32 34 0 21 Jul 2022
Semantic-aware Modular Capsule Routing for Visual Question Answering Yudong Han Jianhua Yin Jianlong Wu Yin-wei Wei Liqiang Nie 35 7 0 21 Jul 2022
Semantic uncertainty intervals for disentangled latent spaces S. Sankaranarayanan Anastasios Nikolas Angelopoulos Stephen Bates Yaniv Romano Phillip Isola UQCV 45 21 0 20 Jul 2022
Rethinking Data Augmentation for Robust Visual Question Answering Long Chen Yuhang Zheng Jun Xiao OOD 37 42 0 18 Jul 2022
Semantic Novelty Detection via Relational Reasoning Francesco Cappio Borlino S. Bucci Tatiana Tommasi 17 4 0 18 Jul 2022
Sparse Relational Reasoning with Object-Centric Representations Alex F Spies Alessandra Russo Murray Shanahan OCL NAI 25 3 0 15 Jul 2022
Convolutional Bypasses Are Better Vision Transformer Adapters Shibo Jie Zhi-Hong Deng VPVLM 21 132 0 14 Jul 2022
3D Concept Grounding on Neural Fields Yining Hong Yilun Du Chun-Tse Lin J. Tenenbaum Chuang Gan 29 19 0 13 Jul 2022
Fine-grained Activities of People Worldwide J. Byrne Greg Castañón Zhongheng Li G. Ettinger 24 3 0 11 Jul 2022
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination Hyounghun Kim Abhaysinh Zala Joey Tianyi Zhou 22 6 0 08 Jul 2022
Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning Kyra Ahrens Matthias Kerzel Jae Hee Lee C. Weber S. Wermter 21 0 0 06 Jul 2022