Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Zhi Zhang
H. Yannakoudakis
Xiantong Zhen
Ekaterina Shutova
29
2
0
17 Feb 2023
On graph-based reentrancy-free semantic parsing
Alban Petit
Caio Corro
GNN
40
3
0
15 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
42
40
0
14 Feb 2023
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval
Yansong Gong
Georgina Cosma
Axel Finke
ViT
30
2
0
13 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
90
574
0
10 Feb 2023
Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames
Ondrej Biza
Sjoerd van Steenkiste
Mehdi S. M. Sajjadi
Gamaleldin F. Elsayed
Aravindh Mahendran
Thomas Kipf
OCL
50
32
0
09 Feb 2023
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
Maor Ivgi
Oliver Hinder
Y. Carmon
ODL
40
57
0
08 Feb 2023
Spatiotemporal Deformation Perception for Fisheye Video Rectification
Shangrong Yang
Chunyu Lin
K. Liao
Yao Zhao
164
2
0
08 Feb 2023
Structured Generative Models for Scene Understanding
Christopher K. I. Williams
OCL
3DV
19
3
0
07 Feb 2023
Object-Centric Scene Representations using Active Inference
Toon Van de Maele
Tim Verbelen
Pietro Mazzaglia
Stefano Ferraro
Bart Dhoedt
OCL
BDL
43
5
0
07 Feb 2023
Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal
Emanuele Marconato
G. Bontempo
E. Ficarra
Simone Calderara
Andrea Passerini
Stefano Teso
NAI
LRM
CLL
43
21
0
02 Feb 2023
On the Efficacy of Differentially Private Few-shot Image Classification
Marlon Tobaben
Aliaksandra Shysheya
J. Bronskill
Andrew Paverd
Shruti Tople
Santiago Zanella Béguelin
Richard Turner
Antti Honkela
43
11
0
02 Feb 2023
Unlocking Slot Attention by Changing Optimal Transport Costs
Yan Zhang
David W. Zhang
Simon Lacoste-Julien
Gertjan J. Burghouts
Cees G. M. Snoek
OCL
41
11
0
30 Jan 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
Emmanuel Abbe
Samy Bengio
Aryo Lotfi
Kevin Rizk
LRM
51
49
0
30 Jan 2023
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Chen Chen
Bowen Zhang
Liangliang Cao
Jiguang Shen
Tom Gunter
Albin Madappally Jose
Alexander Toshev
Jonathon Shlens
Ruoming Pang
Yinfei Yang
VLM
3DV
25
14
0
30 Jan 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
15
1
0
28 Jan 2023
A Simple Recipe for Competitive Low-compute Self supervised Vision Models
Quentin Duval
Ishan Misra
Nicolas Ballas
37
9
0
23 Jan 2023
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding
Juncheng Li
Siliang Tang
Linchao Zhu
Wenqiao Zhang
Yi Yang
Tat-Seng Chua
Fei Wu
Yueting Zhuang
BDL
24
14
0
22 Jan 2023
Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction
Chen Gao
Bin Li
OCL
31
5
0
21 Jan 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSL
AI4TS
MDE
35
326
0
19 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
25
4
0
19 Jan 2023
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
24
1
0
17 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Songlin Yang
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
42
36
0
12 Jan 2023
An Impartial Transformer for Story Visualization
N. Tsakas
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
ViT
37
3
0
09 Jan 2023
MAQA: A Multimodal QA Benchmark for Negation
Judith Yue Li
Aren Jansen
Qingqing Huang
Joonseok Lee
Ravi Ganti
Dima Kuzmin
33
5
0
09 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
35
16
0
26 Dec 2022
DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis
Yinghao Xu
Menglei Chai
Zifan Shi
Sida Peng
Ivan Skorokhodov
...
Ceyuan Yang
Yujun Shen
Hsin-Ying Lee
Bolei Zhou
Sergey Tulyakov
VGen
37
61
0
22 Dec 2022
Deep set conditioned latent representations for action recognition
Akash Singh
Tom De Schepper
Kevin Mets
P. Hellinckx
José Oramas
Steven Latré
BDL
27
2
0
21 Dec 2022
Unleashing the Power of Visual Prompting At the Pixel Level
Junyang Wu
Xianhang Li
Chen Wei
Huiyu Wang
Alan Yuille
Yuyin Zhou
Cihang Xie
VPVLM
VLM
34
31
0
20 Dec 2022
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
Martha Lewis
Nihal V. Nayak
Peilin Yu
Qinan Yu
Jack Merullo
Stephen H. Bach
Ellie Pavlick
VLM
OCL
CoGe
34
59
0
20 Dec 2022
Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?
Monika Wysoczañska
Tom Monnier
Tomasz Trzciñski
David Picard
ReLM
OCL
45
1
0
20 Dec 2022
Benchmarking Spatial Relationships in Text-to-Image Generation
Tejas Gokhale
Hamid Palangi
Besmira Nushi
Vibhav Vineet
Eric Horvitz
Ece Kamar
Chitta Baral
Yezhou Yang
EGVM
51
66
0
20 Dec 2022
Are Deep Neural Networks SMARTer than Second Graders?
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Kevin A. Smith
J. Tenenbaum
AAML
LRM
ReLM
37
29
0
20 Dec 2022
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Fangyu Liu
Francesco Piccinno
Syrine Krichene
Chenxi Pang
Kenton Lee
Mandar Joshi
Yasemin Altun
Nigel Collier
Julian Martin Eisenschlos
VLM
LRM
19
89
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
367
0
19 Dec 2022
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
28
4
0
16 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
37
125
0
13 Dec 2022
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
33
27
0
08 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
32
1
0
08 Dec 2022
Going Beyond XAI: A Systematic Survey for Explanation-Guided Learning
Yuyang Gao
Siyi Gu
Junji Jiang
S. Hong
Dazhou Yu
Liang Zhao
31
39
0
07 Dec 2022
Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task
Shailaja Keyur Sampat
Pratyay Banerjee
Yezhou Yang
Chitta Baral
31
2
0
07 Dec 2022
Learning Action-Effect Dynamics from Pairs of Scene-graphs
Shailaja Keyur Sampat
Pratyay Banerjee
Yezhou Yang
Chitta Baral
GNN
23
0
0
07 Dec 2022
FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer
Shibo Jie
Zhi-Hong Deng
31
127
0
06 Dec 2022
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
Jiaqi Chen
Tong Li
Jinghui Qin
Pan Lu
Liang Lin
Chongyu Chen
Xiaodan Liang
AIMat
LRM
47
90
0
06 Dec 2022
Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests
Christopher Beckham
Martin Weiss
Florian Golemo
S. Honari
Derek Nowrouzezahrai
C. Pal
28
7
0
03 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
55
10
0
01 Dec 2022
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
Zhuowan Li
Xingrui Wang
Elias Stengel-Eskin
Adam Kortylewski
Wufei Ma
Benjamin Van Durme
Max Planck Institute for Informatics
OOD
LRM
36
58
0
01 Dec 2022
Optimizing Explanations by Network Canonization and Hyperparameter Search
Frederik Pahde
Galip Umit Yolcu
Alexander Binder
Wojciech Samek
Sebastian Lapuschkin
60
11
0
30 Nov 2022
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim
Nam-Won Kim
Suha Kwak
31
37
0
30 Nov 2022
Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing
Nataniel Ruiz
Sarah Adel Bargal
Cihang Xie
Kate Saenko
Stan Sclaroff
ViT
44
5
0
29 Nov 2022
Previous
1
2
3
...
11
12
13
...
28
29
30
Next