ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.06890
  4. Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

20 December 2016
Justin Johnson
B. Hariharan
L. V. D. van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
    CoGe
ArXivPDFHTML

Papers citing "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"

50 / 1,475 papers shown
Title
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
On The Specialization of Neural Modules
On The Specialization of Neural Modules
Devon Jarvis
Richard Klein
Benjamin Rosman
Andrew M. Saxe
61
12
0
23 Sep 2024
Reasoning Paths with Reference Objects Elicit Quantitative Spatial
  Reasoning in Large Vision-Language Models
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
Yuan-Hong Liao
Rafid Mahmood
Sanja Fidler
David Acuna
ReLM
LRM
41
10
0
15 Sep 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
41
0
0
14 Sep 2024
What happens to diffusion model likelihood when your model is
  conditional?
What happens to diffusion model likelihood when your model is conditional?
Mattias Cross
Anton Ragni
DiffM
42
0
0
10 Sep 2024
Breaking Neural Network Scaling Laws with Modularity
Breaking Neural Network Scaling Laws with Modularity
Akhilan Boopathy
Sunshine Jiang
William Yue
Jaedong Hwang
Abhiram Iyer
Ila Fiete
OOD
55
2
0
09 Sep 2024
COLUMBUS: Evaluating COgnitive Lateral Understanding through
  Multiple-choice reBUSes
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Koen Kraaijveld
Yifan Jiang
Kaixin Ma
Filip Ilievski
LRM
29
1
0
06 Sep 2024
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal
  Models
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
Bin Fu
Qiyang Wan
Jialin Li
Ruiping Wang
Xilin Chen
50
0
0
03 Sep 2024
A Survey on Evaluation of Multimodal Large Language Models
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
52
20
0
28 Aug 2024
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and
  Analysis
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis
Aishik Nagar
Shantanu Jaiswal
Cheston Tan
ReLM
LRM
28
7
0
27 Aug 2024
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task
Lingyun Huang
Jianxu Mao
Yaonan Wang
Junfei Yi
Ziming Tao
VLM
VPVLM
50
1
0
27 Aug 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
44
62
0
22 Aug 2024
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in
  Visual Question Answering
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering
Yuliang Cai
Mohammad Rostami
CLL
VLM
MLLM
43
2
0
21 Aug 2024
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
Feipeng Ma
Yizhou Zhou
Hebei Li
Zilong He
Siying Wu
Fengyun Rao
Siying Wu
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
41
3
0
21 Aug 2024
Zero-Shot Object-Centric Representation Learning
Zero-Shot Object-Centric Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Anirudh Goyal
Mike Mozer
Yoshua Bengio
Georg Martius
Maximilian Seitzer
VLM
OCL
37
4
0
17 Aug 2024
Linking Robustness and Generalization: A k* Distribution Analysis of
  Concept Clustering in Latent Space for Vision Models
Linking Robustness and Generalization: A k* Distribution Analysis of Concept Clustering in Latent Space for Vision Models
Shashank Kotyan
Pin-Yu Chen
Danilo Vasconcellos Vargas
OOD
45
0
0
17 Aug 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
44
91
0
16 Aug 2024
Historical Printed Ornaments: Dataset and Tasks
Historical Printed Ornaments: Dataset and Tasks
Sayan Kumar Chaki
Z. S. Baltaci
Elliot Vincent
Remi Emonet
Fabienne Vial-Bonacci
Christelle Bahier-Porte
Mathieu Aubry
Thierry Fournel
47
0
0
16 Aug 2024
Token Compensator: Altering Inference Cost of Vision Transformer without
  Re-Tuning
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
Shibo Jie
Yehui Tang
Jianyuan Guo
Zhi-Hong Deng
Kai Han
Yunhe Wang
VLM
41
4
0
13 Aug 2024
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond
  Scaling
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
Haider Al-Tahan
Q. Garrido
Randall Balestriero
Diane Bouchacourt
C. Hazirbas
Mark Ibrahim
VLM
81
10
0
09 Aug 2024
Fast Sprite Decomposition from Animated Graphics
Fast Sprite Decomposition from Animated Graphics
Tomoyuki Suzuki
Kotaro Kikuchi
Kota Yamaguchi
44
1
0
07 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
58
578
0
06 Aug 2024
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language
  Models
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
Agneet Chatterjee
Yiran Luo
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
45
5
0
05 Aug 2024
Compositional Physical Reasoning of Objects and Events from Videos
Compositional Physical Reasoning of Objects and Events from Videos
Zhenfang Chen
Shilong Dong
Kexin Yi
Yunzhu Li
Mingyu Ding
Antonio Torralba
Joshua B. Tenenbaum
Chuang Gan
OCL
35
1
0
02 Aug 2024
Scaling Backwards: Minimal Synthetic Pre-training?
Scaling Backwards: Minimal Synthetic Pre-training?
Ryo Nakamura
Ryu Tadokoro
Ryosuke Yamada
Tim Puhlfürß
Iro Laina
Christian Rupprecht
Walid Maalej
Rio Yokota
Hirokatsu Kataoka
DD
32
2
0
01 Aug 2024
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang
Jiting Cai
Mingyu Liu
Yue Xu
Cewu Lu
Yong-Lu Li
LRM
39
5
0
29 Jul 2024
$VILA^2$: VILA Augmented VILA
VILA2VILA^2VILA2: VILA Augmented VILA
Yunhao Fang
Ligeng Zhu
Yao Lu
Yan Wang
Pavlo Molchanov
Jang Hyun Cho
Marco Pavone
Song Han
Hongxu Yin
VLM
47
7
0
24 Jul 2024
Multi-label Cluster Discrimination for Visual Representation Learning
Multi-label Cluster Discrimination for Visual Representation Learning
Xiang An
Kaicheng Yang
Xiangzi Dai
Ziyong Feng
Jiankang Deng
VLM
45
6
0
24 Jul 2024
Causal Understanding For Video Question Answering
Causal Understanding For Video Question Answering
Bhanu Prakash Reddy Guda
Tanmay Kulkarni
Adithya Sampath
Swarnashree Mysore Sathyendra
CML
54
0
0
23 Jul 2024
CarFormer: Self-Driving with Learned Object-Centric Representations
CarFormer: Self-Driving with Learned Object-Centric Representations
Shadi S. Hamdan
Fatma Guney
3DPC
OCL
46
3
0
22 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
48
5
0
22 Jul 2024
Can VLMs be used on videos for action recognition? LLMs are Visual
  Reasoning Coordinators
Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators
Harsh Lunia
48
0
0
20 Jul 2024
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya
Agney S Talwarr
Vatsal Gupta
Tushar Kataria
Dan Roth
Vivek Gupta
LRM
67
2
0
15 Jul 2024
GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and
  Configuration
GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and Configuration
Beni Ifland
Elad Duani
Rubin Krief
Miro Ohana
Aviram Zilberman
...
Ortal Lavi
Hikichi Kenji
A. Shabtai
Yuval Elovici
Rami Puzis
38
3
0
11 Jul 2024
Position: Measure Dataset Diversity, Don't Just Claim It
Position: Measure Dataset Diversity, Don't Just Claim It
Dora Zhao
Jerone T. A. Andrews
Orestis Papakyriakopoulos
Alice Xiang
64
14
0
11 Jul 2024
The Computational Learning of Construction Grammars: State of the Art
  and Prospective Roadmap
The Computational Learning of Construction Grammars: State of the Art and Prospective Roadmap
Jonas Doumen
V. Schmalz
Katrien Beuls
Paul Van Eecke
31
1
0
10 Jul 2024
Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses
  from Diagram
Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram
Ming-Liang Zhang
Zhong-Zhi Li
Fei Yin
Liang Lin
Cheng-Lin Liu
LRM
24
6
0
10 Jul 2024
ConceptExpress: Harnessing Diffusion Models for Single-image
  Unsupervised Concept Extraction
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Shaozhe Hao
Kai Han
Zhengyao Lv
Shihao Zhao
Kwan-Yee K. Wong
DiffM
CoGe
36
6
0
09 Jul 2024
OneDiff: A Generalist Model for Image Difference Captioning
OneDiff: A Generalist Model for Image Difference Captioning
Erdong Hu
Longteng Guo
Tongtian Yue
Zijia Zhao
Shuning Xue
Jing Liu
VLM
36
2
0
08 Jul 2024
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional
  Temporal Grounding
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
Zixu Cheng
Yujiang Pu
Shaogang Gong
Parisa Kordjamshidi
Yu Kong
AI4TS
38
0
0
06 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long
  Context and Video Understanding
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
54
5
0
06 Jul 2024
Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing
Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing
Anushrut Jignasu
Kelly O. Marshall
Ankush Kumar Mishra
Lucas Nerone Rillo
Baskar Ganapathysubramanian
Aditya Balu
Chinmay Hegde
Adarsh Krishnamurthy
30
0
0
04 Jul 2024
Attention Normalization Impacts Cardinality Generalization in Slot
  Attention
Attention Normalization Impacts Cardinality Generalization in Slot Attention
Markus Krimmel
Jan Achterhold
Joerg Stueckler
OCL
45
0
0
04 Jul 2024
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Changdae Oh
Gyeongdeok Seo
Geunyoung Jung
Zhi-Qi Cheng
Hosik Choi
Jiyoung Jung
Kyungwoo Song
VLM
44
1
0
04 Jul 2024
Funny-Valen-Tine: Planning Solution Distribution Enhances Machine
  Abstract Reasoning Ability
Funny-Valen-Tine: Planning Solution Distribution Enhances Machine Abstract Reasoning Ability
Ruizhuo Song
Beiming Yuan
OOD
31
0
0
02 Jul 2024
SADL: An Effective In-Context Learning Method for Compositional Visual
  QA
SADL: An Effective In-Context Learning Method for Compositional Visual QA
Long Hoang Dang
T. Le
Vuong Le
Tu Minh Phuong
Truyen Tran
ReLM
CoGe
54
2
0
02 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
61
25
0
28 Jun 2024
Compositional Image Decomposition with Diffusion Models
Compositional Image Decomposition with Diffusion Models
Jocelin Su
Nan Liu
Yanbo Wang
Joshua B. Tenenbaum
Yilun Du
CoGe
47
6
0
27 Jun 2024
The Illusion of Competence: Evaluating the Effect of Explanations on
  Users' Mental Models of Visual Question Answering Systems
The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems
Judith Sieker
Simeon Junker
R. Utescher
Nazia Attari
H. Wersing
Hendrik Buschmeier
Sina Zarrieß
30
1
0
27 Jun 2024
Towards Compositionality in Concept Learning
Towards Compositionality in Concept Learning
Adam Stein
Aaditya Naik
Yinjun Wu
Mayur Naik
Eric Wong
CoGe
39
2
0
26 Jun 2024
Previous
12345...282930
Next