ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.06890
  4. Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

20 December 2016
Justin Johnson
B. Hariharan
L. V. D. van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
    CoGe
ArXivPDFHTML

Papers citing "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"

50 / 1,475 papers shown
Title
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
Éloi Zablocki
Valentin Gerard
Amaia Cardiel
Eric Gaussier
Matthieu Cord
Eduardo Valle
84
0
0
23 Nov 2024
Benchmarking Multimodal Models for Ukrainian Language Understanding
  Across Academic and Cultural Domains
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains
Yurii Paniv
Artur Kiulian
Dmytro Chaplynskyi
M. Khandoga
Anton Polishko
Tetiana Bas
Guillermo Gabrielli
73
1
0
22 Nov 2024
Learning to Reason Iteratively and Parallelly for Complex Visual
  Reasoning Scenarios
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
79
2
0
20 Nov 2024
AtomThink: A Slow Thinking Framework for Multimodal Mathematical
  Reasoning
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Runhui Huang
...
Yihan Zeng
J. Han
Lanqing Hong
Hang Xu
Xiaodan Liang
LRM
109
10
0
18 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Yunlong Tang
Junjia Guo
Hang Hua
Susan Liang
Mingqian Feng
...
Chao Huang
Jing Bi
Zeliang Zhang
Pooyan Fazli
Chenliang Xu
CoGe
79
8
0
17 Nov 2024
VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges
  in Video Cognition
VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges in Video Cognition
Chenglin Li
Qianglong Chen
Zhi Li
Feng Tao
Yin Zhang
39
0
0
14 Nov 2024
ClevrSkills: Compositional Language and Visual Reasoning in Robotics
ClevrSkills: Compositional Language and Visual Reasoning in Robotics
Sanjay Haresh
Daniel Dijkman
Apratim Bhattacharyya
Roland Memisevic
CoGe
LRM
42
1
0
13 Nov 2024
Quantifying artificial intelligence through algebraic generalization
Quantifying artificial intelligence through algebraic generalization
Takuya Ito
Murray Campbell
L. Horesh
Tim Klinger
Parikshit Ram
ELM
53
0
0
08 Nov 2024
Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent
  Reinforcement Learning
Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning
Zhiyu Shao
Qiong Wu
Pingyi Fan
Kezhi Wang
Q. Fan
Wen Chen
Khaled B. Letaief
34
2
0
07 Nov 2024
VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation
VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation
Haochen Zhang
Nader Zantout
Pujith Kachana
Zongyuan Wu
Ji Zhang
Wenshan Wang
3DV
LM&Ro
49
5
0
05 Nov 2024
Bootstrapping Top-down Information for Self-modulating Slot Attention
Bootstrapping Top-down Information for Self-modulating Slot Attention
Dongwon Kim
Seoyeon Kim
Suha Kwak
OCL
ObjD
40
0
0
04 Nov 2024
Goal-Oriented Semantic Communication for Wireless Visual Question
  Answering
Goal-Oriented Semantic Communication for Wireless Visual Question Answering
Sige Liu
Nan Li
Yansha Deng
Tony Q. S. Quek
39
0
0
03 Nov 2024
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation
Bowen Li
Zhaoyu Li
Qiwei Du
Jinqi Luo
Wenshan Wang
...
Katia P. Sycara
Pradeep Kumar Ravikumar
Alexander G. Gray
X. Si
Sebastian A. Scherer
AI4CE
LRM
81
3
0
01 Nov 2024
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
Sina Rismanchian
Yasaman Razeghi
Sameer Singh
Shayan Doroudi
51
1
0
31 Oct 2024
Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem
Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem
Declan Campbell
Sunayana Rane
Tyler Giallanza
Nicolò De Sabbata
Kia Ghods
...
Alexander Ku
Steven M. Frankland
Thomas Griffiths
Jonathan D. Cohen
Taylor W. Webb
42
13
0
31 Oct 2024
Unsupervised Object Discovery: A Comprehensive Survey and Unified
  Taxonomy
Unsupervised Object Discovery: A Comprehensive Survey and Unified Taxonomy
José-Fabian Villa-Vásquez
M. Pedersoli
50
1
0
30 Oct 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
218
1
0
30 Oct 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
31
0
0
30 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
58
5
0
28 Oct 2024
FACTS: A Factored State-Space Framework For World Modelling
FACTS: A Factored State-Space Framework For World Modelling
Li Nanbo
Firas Laakom
Yucheng Xu
Wenyi Wang
Jürgen Schmidhuber
AI4TS
208
0
0
28 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive
  Self-supervised Learning
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Shentong Mo
Shengbang Tong
40
1
0
25 Oct 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst
Tim Nelson Tobiasch
Lukas Helff
Inga Ibs
Wolfgang Stammer
Devendra Singh Dhami
Constantin Rothkopf
Kristian Kersting
CoGe
ReLM
VLM
LRM
77
1
0
25 Oct 2024
Learning Global Object-Centric Representations via Disentangled Slot
  Attention
Learning Global Object-Centric Representations via Disentangled Slot Attention
Tonglin Chen
Yinxuan Huang
Zhimeng Shen
Jinghao Huang
Bin Li
Xiangyang Xue
OCL
41
1
0
24 Oct 2024
Probabilistic Language-Image Pre-Training
Probabilistic Language-Image Pre-Training
Sanghyuk Chun
Wonjae Kim
Song Park
Sangdoo Yun
MLLM
VLM
CLIP
194
4
2
24 Oct 2024
Object-Centric Temporal Consistency via Conditional Autoregressive
  Inductive Biases
Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases
Cristian Meo
Akihiro Nakano
Mircea Lica
Aniket Didolkar
Masahiro Suzuki
Anirudh Goyal
Mengmi Zhang
Justin Dauwels
Y. Matsuo
Yoshua Bengio
OCL
41
2
0
21 Oct 2024
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for
  Bangla
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
Deeparghya Dutta Barua
Md Sakib Ul Rahman Sourove
Md Farhan Ishmam
Fabiha Haider
Fariha Tanjim Shifat
Md Fahim
Md Farhad Alam
29
0
0
19 Oct 2024
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
...
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
Jiankang Deng
MLLM
VLM
45
1
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic
  Reasoning Tasks
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
37
0
0
17 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic
  Modeling
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
28
7
0
14 Oct 2024
Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain
  Navigation
Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation
Youwei Yu
Junhong Xu
Lantao Liu
39
3
0
14 Oct 2024
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained
  Vision-Language Models
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua
Yunlong Tang
Ziyun Zeng
Liangliang Cao
Zhengyuan Yang
Hangfeng He
Chenliang Xu
Jiebo Luo
VLM
CoGe
44
9
0
13 Oct 2024
Declarative Knowledge Distillation from Large Language Models for Visual
  Question Answering Datasets
Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets
Thomas Eiter
Jan Hadl
N. Higuera
J. Oetsch
31
0
0
12 Oct 2024
Visual Scratchpads: Enabling Global Reasoning in Vision
Visual Scratchpads: Enabling Global Reasoning in Vision
Aryo Lotfi
Enrico Fini
Samy Bengio
Moin Nabi
Emmanuel Abbe
LRM
42
0
0
10 Oct 2024
Insight Over Sight? Exploring the Vision-Knowledge Conflicts in
  Multimodal LLMs
Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
Xiaoyuan Liu
Wenxuan Wang
Youliang Yuan
Jen-tse Huang
Qiuzhi Liu
Pinjia He
Zhaopeng Tu
185
1
0
10 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
37
3
0
09 Oct 2024
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Fabian Paischer
Lukas Hauzenberger
Thomas Schmied
Benedikt Alkin
Marc Peter Deisenroth
Sepp Hochreiter
42
4
0
09 Oct 2024
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language
  Models in Hospital Environments
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
Sourjyadip Ray
Kushal Gupta
Soumi Kundu
Payal Arvind Kasat
Somak Aditya
Pawan Goyal
23
2
0
08 Oct 2024
Tackling the Abstraction and Reasoning Corpus with Vision Transformers:
  the Importance of 2D Representation, Positions, and Objects
Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects
Wenhao Li
Yudong Xu
Scott Sanner
Elias Boutros Khalil
ViT
39
3
0
08 Oct 2024
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical
  Alignment
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Yifei Xing
Xiangyuan Lan
Ruiping Wang
D. Jiang
Wenjun Huang
Qingfang Zheng
Yaowei Wang
Mamba
38
0
0
08 Oct 2024
Core Tokensets for Data-efficient Sequential Training of Transformers
Core Tokensets for Data-efficient Sequential Training of Transformers
Subarnaduti Paul
Manuel Brack
P. Schramowski
Kristian Kersting
Martin Mundt
37
0
0
08 Oct 2024
Precise Model Benchmarking with Only a Few Observations
Precise Model Benchmarking with Only a Few Observations
Riccardo Fogliato
Pratik Patil
Nil-Jana Akpinar
Mathew Monfort
29
0
0
07 Oct 2024
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Dianzhi Yu
Xinni Zhang
Yankai Chen
Aiwei Liu
Yifei Zhang
Philip S. Yu
Irwin King
VLM
CLL
49
10
0
07 Oct 2024
Intriguing Properties of Large Language and Vision Models
Intriguing Properties of Large Language and Vision Models
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Yechan Hwang
Ho-Jin Choi
LRM
VLM
51
0
0
07 Oct 2024
SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse
  Problems
SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems
Ismail R. Alkhouri
Shijun Liang
Cheng-Han Huang
Jimmy Dai
Qing Qu
S. Ravishankar
Rongrong Wang
DiffM
MedIm
14
7
0
06 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
89
26
0
04 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
44
5
0
03 Oct 2024
Why context matters in VQA and Reasoning: Semantic interventions for VLM
  input modalities
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
Kenza Amara
Lukas Klein
Carsten T. Lüth
Paul Jäger
Hendrik Strobelt
Mennatallah El-Assady
30
1
0
02 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
40
32
1
30 Sep 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on
  Comprehensive Long Video Understanding
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
43
6
0
27 Sep 2024
Previous
123456...282930
Next