ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.06890
  4. Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
    CoGe
ArXivPDFHTML

Papers citing "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"

50 / 1,475 papers shown
Title
A Data Source for Reasoning Embodied Agents
A Data Source for Reasoning Embodied Agents
Jack Lanchantin
Sainbayar Sukhbaatar
Gabriel Synnaeve
Yuxuan Sun
Kavya Srinet
Arthur Szlam
LM&Ro
LRM
30
5
0
14 Sep 2023
Dynamic MOdularized Reasoning for Compositional Structured Explanation
  Generation
Dynamic MOdularized Reasoning for Compositional Structured Explanation Generation
Xiyan Fu
Anette Frank
LRM
39
1
0
14 Sep 2023
Hydra: Multi-head Low-rank Adaptation for Parameter Efficient
  Fine-tuning
Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning
Sanghyeon Kim
Hyunmo Yang
Younghyun Kim
Youngjoon Hong
Eunbyung Park
AI4CE
32
16
0
13 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
56
3
0
13 Sep 2023
Compositional Learning of Visually-Grounded Concepts Using Reinforcement
Compositional Learning of Visually-Grounded Concepts Using Reinforcement
Zijun Lin
Haidi Azaman
M Ganesh Kumar
Cheston Tan
CoGe
OffRL
25
3
0
08 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex
  Visually-Grounded Referencing using Determiners
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
Clarence Lee
M Ganesh Kumar
Cheston Tan
28
3
0
07 Sep 2023
Spatial and Visual Perspective-Taking via View Rotation and Relation
  Reasoning for Embodied Reference Understanding
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding
Cheng Shi
Sibei Yang
LRM
29
6
0
03 Sep 2023
Iterative Multi-granular Image Editing using Diffusion Models
Iterative Multi-granular Image Editing using Diffusion Models
K. J. Joseph
Prateksha Udhayanan
Tripti Shukla
Aishwarya Agarwal
Srikrishna Karanam
Koustava Goswami
Balaji Vasan Srinivasan
DiffM
33
16
0
01 Sep 2023
RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in
  Object-centric Learning
RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in Object-centric Learning
Nathan G. Drenkow
Mathias Unberath
36
5
0
28 Aug 2023
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Emanuele Bugliarello
Hernan Moraldo
Ruben Villegas
Mohammad Babaeizadeh
M. Saffar
Han Zhang
D. Erhan
V. Ferrari
Pieter-Jan Kindermans
P. Voigtlaender
VGen
41
10
0
22 Aug 2023
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language
  Models
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
Navid Rajabi
Jana Kosecka
VLM
34
11
0
18 Aug 2023
Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop
  Visual Reasoning
Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning
Pengbo Hu
Jingxian Qi
Xingyu Li
Hong Li
Xinqi Wang
Bing Quan
Ruiyu Wang
Yi Zhou
LRM
LLMAG
38
15
0
18 Aug 2023
Learning the meanings of function words from grounded language using a
  visual question answering model
Learning the meanings of function words from grounded language using a visual question answering model
Eva Portelance
Michael C. Frank
Dan Jurafsky
NAI
38
7
0
16 Aug 2023
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
VLM
CLIP
48
46
0
16 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
36
77
0
12 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Foundation Model is Efficient Multimodal Multitask Model Selector
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
30
13
0
11 Aug 2023
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of
  Explainable AI Methods
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods
Robin Hesse
Simone Schaub-Meyer
Stefan Roth
AAML
37
33
0
11 Aug 2023
When and How Does Known Class Help Discover Unknown Ones? Provable
  Understanding Through Spectral Analysis
When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis
Yiyou Sun
Zhenmei Shi
Yingyu Liang
Yixuan Li
45
19
0
09 Aug 2023
PUG: Photorealistic and Semantically Controllable Synthetic Data for
  Representation Learning
PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning
Florian Bordes
Shashank Shekhar
Mark Ibrahim
Diane Bouchacourt
Pascal Vincent
Ari S. Morcos
36
26
0
08 Aug 2023
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free
  Domain Adaptation
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
Xuefeng Hu
Ke Zhang
Lu Xia
Albert Y. C. Chen
Jiajia Luo
...
Nan Qiao
Xiao Zeng
Min Sun
Cheng-Hao Kuo
Ram Nevatia
VLM
27
25
0
04 Aug 2023
Stochastic positional embeddings improve masked image modeling
Stochastic positional embeddings improve masked image modeling
Amir Bar
Florian Bordes
Assaf Shocher
Mahmoud Assran
Pascal Vincent
Nicolas Ballas
Trevor Darrell
Amir Globerson
Yann LeCun
36
3
0
31 Jul 2023
Revisiting the Parameter Efficiency of Adapters from the Perspective of
  Precision Redundancy
Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy
Shibo Jie
Haoqing Wang
Zhiwei Deng
32
31
0
31 Jul 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for
  Complex Visual Reasoning Tasks
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Kousik Rajesh
Mrigank Raman
M. A. Karim
Pranit Chawla
VLM
25
2
0
31 Jul 2023
Context-VQA: Towards Context-Aware and Purposeful Visual Question
  Answering
Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering
N. Naik
Christopher Potts
Elisa Kreiss
35
3
0
28 Jul 2023
Testing the Depth of ChatGPT's Comprehension via Cross-Modal Tasks Based
  on ASCII-Art: GPT3.5's Abilities in Regard to Recognizing and Generating
  ASCII-Art Are Not Totally Lacking
Testing the Depth of ChatGPT's Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5's Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking
David Bayani
MLLM
38
5
0
28 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
32
18
0
21 Jul 2023
CLR: Channel-wise Lightweight Reprogramming for Continual Learning
CLR: Channel-wise Lightweight Reprogramming for Continual Learning
Yunhao Ge
Yuecheng Li
Shuo Ni
Jiaping Zhao
Ming Yang
Laurent Itti
CLL
50
11
0
21 Jul 2023
OBJECT 3DIT: Language-guided 3D-aware Image Editing
OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michel
Anand Bhattad
Eli VanderBilt
Ranjay Krishna
Aniruddha Kembhavi
Tanmay Gupta
DiffM
37
38
0
20 Jul 2023
Improving Multimodal Datasets with Image Captioning
Improving Multimodal Datasets with Image Captioning
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
19
71
0
19 Jul 2023
Grounded Object Centric Learning
Grounded Object Centric Learning
Avinash Kori
Francesco Locatello
Fabio De Sousa Ribeiro
Francesca Toni
Ben Glocker
OCL
22
7
0
18 Jul 2023
Distilling Knowledge from Text-to-Image Generative Models Improves
  Visio-Linguistic Reasoning in CLIP
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP
S. Basu
S. Hu
Maziar Sanjabi
Daniela Massiceti
S. Feizi
VLM
24
4
0
18 Jul 2023
COLLIE: Systematic Construction of Constrained Text Generation Tasks
COLLIE: Systematic Construction of Constrained Text Generation Tasks
Shunyu Yao
Howard Chen
Austin W. Hanjie
Runzhe Yang
Karthik Narasimhan
47
32
0
17 Jul 2023
Does Visual Pretraining Help End-to-End Reasoning?
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCL
LRM
ViT
38
3
0
17 Jul 2023
Multi-Object Discovery by Low-Dimensional Object Motion
Multi-Object Discovery by Low-Dimensional Object Motion
Sadra Safadoust
Fatma Guney
OCL
29
9
0
16 Jul 2023
IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation
IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation
Thiviyan Thanapalasingam
Emile van Krieken
Peter Bloem
Paul T. Groth
34
1
0
13 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Bo-wen Li
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
29
934
0
12 Jul 2023
Diffusion idea exploration for art generation
Diffusion idea exploration for art generation
N. Verma
DiffM
37
1
0
11 Jul 2023
Compositional Generalization from First Principles
Compositional Generalization from First Principles
Thaddäus Wiedemer
Prasanna Mayilvahanan
Matthias Bethge
Wieland Brendel
OCL
34
37
0
10 Jul 2023
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery
Yun-Qiu Lv
Jing Zhang
Nick Barnes
Yuchao Dai
36
11
0
07 Jul 2023
Human Inspired Progressive Alignment and Comparative Learning for
  Grounded Word Acquisition
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition
Yuwei Bao
B. Lattimer
J. Chai
CLL
46
1
0
05 Jul 2023
Additive Decoders for Latent Variables Identification and
  Cartesian-Product Extrapolation
Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation
Sébastien Lachapelle
Divyat Mahajan
Ioannis Mitliagkas
Simon Lacoste-Julien
42
25
0
05 Jul 2023
SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
Lasha Abzianidze
J. Zwarts
Yoad Winter
27
2
0
05 Jul 2023
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Hikaru Shindo
Viktor Pfanschilling
Devendra Singh Dhami
Kristian Kersting
NAI
34
6
0
03 Jul 2023
The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes
The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes
D. Recasens
Martin R. Oswald
Marc Pollefeys
Javier Civera
40
3
0
29 Jun 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual
  Question Answering
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
45
7
0
29 Jun 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video
  Retrieval Models
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
44
3
0
28 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
44
603
0
27 Jun 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun
  resolution
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
39
39
0
21 Jun 2023
Neuro-Symbolic Bi-Directional Translation -- Deep Learning
  Explainability for Climate Tipping Point Research
Neuro-Symbolic Bi-Directional Translation -- Deep Learning Explainability for Climate Tipping Point Research
C. Ashcraft
Jennifer Sleeman
Caroline Tang
Jay Brett
A. Gnanadesikan
26
1
0
19 Jun 2023
The Psychophysics of Human Three-Dimensional Active Visuospatial
  Problem-Solving
The Psychophysics of Human Three-Dimensional Active Visuospatial Problem-Solving
M. Solbach
John K. Tsotsos
28
0
0
19 Jun 2023
Previous
123...8910...282930
Next