ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.06890
  4. Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
    CoGe
ArXivPDFHTML

Papers citing "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"

50 / 1,475 papers shown
Title
OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for
  Object-Centric Learning
OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning
Yin-Tao Huang
Tonglin Chen
Zhimeng Shen
Jinghao Huang
Bin Li
Xiangyang Xue
OCL
40
1
0
16 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
46
12
0
16 Jun 2023
Modularity Trumps Invariance for Compositional Robustness
Modularity Trumps Invariance for Compositional Robustness
I. Mason
Anirban Sarkar
Tomotake Sasaki
Xavier Boix
OOD
26
1
0
15 Jun 2023
LOVM: Language-Only Vision Model Selection
LOVM: Language-Only Vision Model Selection
O. Zohar
Shih-Cheng Huang
Kuan-Chieh Jackson Wang
Serena Yeung
MLLM
47
13
0
15 Jun 2023
Linguistic Binding in Diffusion Models: Enhancing Attribute
  Correspondence through Attention Map Alignment
Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
Royi Rassin
Eran Hirsch
Daniel Glickman
Shauli Ravfogel
Yoav Goldberg
Gal Chechik
DiffM
45
100
0
15 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
43
7
0
14 Jun 2023
Scalable Neural-Probabilistic Answer Set Programming
Scalable Neural-Probabilistic Answer Set Programming
Arseny Skryagin
Daniel Ochs
Devendra Singh Dhami
Kristian Kersting
42
5
0
14 Jun 2023
Urania: Visualizing Data Analysis Pipelines for Natural Language-Based
  Data Exploration
Urania: Visualizing Data Analysis Pipelines for Natural Language-Based Data Exploration
Yi Guo
Nana Cao
Xiaoyu Qi
Haoyang Li
Danqing Shi
Jing Zhang
Qing Chen
Daniel Weiskopf
37
4
0
13 Jun 2023
V-LoL: A Diagnostic Dataset for Visual Logical Learning
V-LoL: A Diagnostic Dataset for Visual Logical Learning
Lukas Helff
Wolfgang Stammer
Hikaru Shindo
Devendra Singh Dhami
Kristian Kersting
NAI
27
3
0
13 Jun 2023
Generating Language Corrections for Teaching Physical Control Tasks
Generating Language Corrections for Teaching Physical Control Tasks
Megha Srivastava
Noah D. Goodman
Dorsa Sadigh
36
5
0
12 Jun 2023
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic
  Latent Particles
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
Tal Daniel
Aviv Tamar
DiffM
35
8
0
09 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review
  of Methodological Advances and Future Research Directions
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
33
22
0
09 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Dealing with Semantic Underspecification in Multimodal NLP
Sandro Pezzelle
23
9
0
08 Jun 2023
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual
  Instruction Tuning
M3^33IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
27
115
0
07 Jun 2023
Multimodal Fusion Interactions: A Study of Human and Automatic
  Quantification
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification
Paul Pu Liang
Yun Cheng
Ruslan Salakhutdinov
Louis-Philippe Morency
25
6
0
07 Jun 2023
Infusing Lattice Symmetry Priors in Attention Mechanisms for
  Sample-Efficient Abstract Geometric Reasoning
Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning
Mattia Atzeni
Mrinmaya Sachan
Andreas Loukas
LRM
30
3
0
05 Jun 2023
Systematic Visual Reasoning through Object-Centric Relational
  Abstraction
Systematic Visual Reasoning through Object-Centric Relational Abstraction
Taylor Webb
S. S. Mondal
Jonathan D. Cohen
OCL
30
24
0
04 Jun 2023
TimelineQA: A Benchmark for Question Answering over Timelines
TimelineQA: A Benchmark for Question Answering over Timelines
W. Tan
Jane Dwivedi-Yu
Yuliang Li
Lambert Mathias
Marzieh Saeidi
J. Yan
A. Halevy
LMTD
32
10
0
01 Jun 2023
MEWL: Few-shot multimodal word learning with referential uncertainty
MEWL: Few-shot multimodal word learning with referential uncertainty
Guangyuan Jiang
Manjie Xu
Shiji Xin
Weihan Liang
Yujia Peng
Chi Zhang
Yixin Zhu
OffRL
39
16
0
01 Jun 2023
Sensitivity of Slot-Based Object-Centric Models to their Number of Slots
Sensitivity of Slot-Based Object-Centric Models to their Number of Slots
Roland S. Zimmermann
Sjoerd van Steenkiste
Mehdi S. M. Sajjadi
Thomas Kipf
Klaus Greff
OCL
35
5
0
30 May 2023
Autoencoding Conditional Neural Processes for Representation Learning
Autoencoding Conditional Neural Processes for Representation Learning
Victor Prokhorov
Ivan Titov
N. Siddharth
BDL
20
0
0
29 May 2023
Multi-Scale Attention for Audio Question Answering
Multi-Scale Attention for Audio Question Answering
Guangyao Li
Yixin Xu
Di Hu
30
16
0
29 May 2023
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation
  based on Visual Illusion
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion
Haobo Yang
Wenyu Wang
Zexin Cao
Zhekai Duan
Xuchen Liu
VLM
26
0
0
28 May 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research
  in Hausa Language
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
28
5
0
28 May 2023
Im-Promptu: In-Context Composition from Image Prompts
Im-Promptu: In-Context Composition from Image Prompts
Bhishma Dedhia
Michael Chang
Jake C. Snell
Thomas Griffiths
N. Jha
LRM
MLLM
32
1
0
26 May 2023
Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic
  Contrast Sets
Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets
Brandon Smith
Miguel Farinha
S. Hall
Hannah Rose Kirk
Aleksandar Shtedritski
Max Bain
44
19
0
24 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental
  Algorithm for Referring Expression Generation from Examples
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
P. Sadler
David Schlangen
29
2
0
24 May 2023
Text encoders bottleneck compositionality in contrastive vision-language
  models
Text encoders bottleneck compositionality in contrastive vision-language models
Amita Kamath
Jack Hessel
Kai-Wei Chang
CoGe
CLIP
VLM
30
19
0
24 May 2023
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for
  Autonomous Driving Scenario
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
Tianwen Qian
Jingjing Chen
Linhai Zhuo
Yang Jiao
Yueping Jiang
29
138
0
24 May 2023
Image Manipulation via Multi-Hop Instructions -- A New Dataset and
  Weakly-Supervised Neuro-Symbolic Approach
Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach
Harman Singh
Poorva Garg
M. Gupta
Kevin Shah
Ashish Goswami
A. Mondal
Arnab Kumar Mondal
Dinesh Khandelwal
Dinesh Garg
Parag Singla
LM&Ro
21
1
0
23 May 2023
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
Ziyi Wu
Jingyu Hu
Wuyue Lu
Igor Gilitschenski
Animesh Garg
DiffM
OCL
41
45
0
18 May 2023
Visual Question Answering: A Survey on Techniques and Common Trends in
  Recent Literature
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
30
23
0
18 May 2023
Probing the Role of Positional Information in Vision-Language Models
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
24
8
0
17 May 2023
HICO-DET-SG and V-COCO-SG: New Data Splits for Evaluating the Systematic
  Generalization Performance of Human-Object Interaction Detection Models
HICO-DET-SG and V-COCO-SG: New Data Splits for Evaluating the Systematic Generalization Performance of Human-Object Interaction Detection Models
Kenta Takemoto
Moyuru Yamada
Tomotake Sasaki
H. Akima
39
0
0
17 May 2023
Motion Question Answering via Modular Motion Programs
Motion Question Answering via Modular Motion Programs
Mark Endo
Joy Hsu
Jiaman Li
Jiajun Wu
LRM
30
14
0
15 May 2023
Neurosymbolic AI and its Taxonomy: a survey
Neurosymbolic AI and its Taxonomy: a survey
Wandemberg Gibaut
Leonardo Pereira
Fabio Grassiotto
Alexandre Osorio
Eder Gadioli
Amparo Munoz
Sildolfo Gomes
Claudio dos Santos
NAI
AI4CE
35
5
0
12 May 2023
A Memory Model for Question Answering from Streaming Data Supported by
  Rehearsal and Anticipation of Coreference Information
A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information
Vladimir Araujo
Alvaro Soto
Marie-Francine Moens
KELM
22
2
0
12 May 2023
Combo of Thinking and Observing for Outside-Knowledge VQA
Combo of Thinking and Observing for Outside-Knowledge VQA
Q. Si
Yuchen Mo
Zheng Lin
Huishan Ji
Weiping Wang
51
13
0
10 May 2023
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
T. Gong
Chengqi Lyu
Shilong Zhang
Yudong Wang
Miao Zheng
Qianmengke Zhao
Kuikun Liu
Wenwei Zhang
Ping Luo
Kai-xiang Chen
MLLM
34
254
0
08 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
41
35
0
05 May 2023
Continual Reasoning: Non-Monotonic Reasoning in Neurosymbolic AI using
  Continual Learning
Continual Reasoning: Non-Monotonic Reasoning in Neurosymbolic AI using Continual Learning
Sofoklis Kyriakopoulos
Artur Garcez
NAI
LRM
26
0
0
03 May 2023
Visual Transformation Telling
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
J. Guo
Xueqi Cheng
LRM
67
1
0
03 May 2023
Visual Reasoning: from State to Transformation
Visual Reasoning: from State to Transformation
Xin Hong
Yanyan Lan
Liang Pang
J. Guo
Xueqi Cheng
LRM
27
4
0
02 May 2023
Multimodal Graph Transformer for Multimodal Question Answering
Multimodal Graph Transformer for Multimodal Question Answering
Xuehai He
Xin Eric Wang
38
7
0
30 Apr 2023
Energy-based Models are Zero-Shot Planners for Compositional Scene
  Rearrangement
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
N. Gkanatsios
Ayush Jain
Zhou Xian
Yunchu Zhang
C. Atkeson
Katerina Fragkiadaki
LM&Ro
98
31
0
27 Apr 2023
DataComp: In search of the next generation of multimodal datasets
DataComp: In search of the next generation of multimodal datasets
S. Gadre
Gabriel Ilharco
Alex Fang
J. Hayase
Georgios Smyrnis
...
A. Dimakis
J. Jitsev
Y. Carmon
Vaishaal Shankar
Ludwig Schmidt
VLM
33
415
0
27 Apr 2023
PVP: Pre-trained Visual Parameter-Efficient Tuning
PVP: Pre-trained Visual Parameter-Efficient Tuning
Zhao Song
Ke Yang
Naiyang Guan
Junjie Zhu
Peng Qiao
Qingyong Hu
VPVLM
VLM
40
3
0
26 Apr 2023
Long-Term Photometric Consistent Novel View Synthesis with Diffusion
  Models
Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
Jason J. Yu
Fereshteh Forghani
Konstantinos G. Derpanis
Marcus A. Brubaker
DiffM
37
45
0
21 Apr 2023
Hyperbolic Image-Text Representations
Hyperbolic Image-Text Representations
Karan Desai
Maximilian Nickel
Tanmay Rajpurohit
Justin Johnson
Ramakrishna Vedantam
VLM
47
57
0
18 Apr 2023
Learning Situation Hyper-Graphs for Video Question Answering
Learning Situation Hyper-Graphs for Video Question Answering
Aisha Urooj Khan
Hilde Kuehne
Bo Wu
Kim Chheu
Walid Bousselham
Chuang Gan
N. Lobo
M. Shah
41
15
0
18 Apr 2023
Previous
123...91011...282930
Next