ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.06890
  4. Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

20 December 2016
Justin Johnson
B. Hariharan
L. V. D. van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
    CoGe
ArXivPDFHTML

Papers citing "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"

50 / 1,475 papers shown
Title
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
Jianxin Li
Xiang Yue
Bo Li
Ping Nie
Kai Zou
Wenhu Chen
LRM
79
2
0
13 Mar 2025
Revisiting semi-supervised learning in the era of foundation models
Ping Zhang
Zheda Mai
Quang-Huy Nguyen
Wei-Lun Chao
52
0
0
12 Mar 2025
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval
Stefan Sylvius Wagner
Stefan Harmeling
OCL
76
0
0
12 Mar 2025
LongProLIP: A Probabilistic Vision-Language Model with Long Context Text
Sanghyuk Chun
Sangdoo Yun
VLM
51
1
0
11 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
59
46
0
09 Mar 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Kaixin Cai
...
Yu-Jie Yuan
J. Han
Lanqing Hong
Hang Xu
Xiaodan Liang
ReLM
LRM
62
6
0
08 Mar 2025
Vision-Language Models Struggle to Align Entities across Modalities
Iñigo Alonso
Ander Salaberria
Gorka Azkune
Jeremy Barnes
Oier López de Lacalle
VLM
66
0
0
05 Mar 2025
MuBlE: MuJoCo and Blender simulation Environment and Benchmark for Task Planning in Robot Manipulation
Michal Nazarczuk
Karla Stepanova
Jan Kristof Behrens
Matej Hoffmann
K. Mikolajczyk
LM&Ro
53
0
0
04 Mar 2025
Analyzing CLIP's Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study
Analyzing CLIP's Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study
Reza Abbasi
Ali Nazari
Aminreza Sefid
Mohammadali Banayeeanzade
M. Rohban
M. Baghshah
VLM
64
1
0
27 Feb 2025
Data Distributional Properties As Inductive Bias for Systematic Generalization
Data Distributional Properties As Inductive Bias for Systematic Generalization
Felipe del-Rio
Alain Raymond-Sáez
Daniel Florea
Rodrigo Toro Icarte
Julio Hurtado
Cristian B. Calderon
Á. Soto
AI4CE
38
0
0
27 Feb 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Ziyi Wang
Yang Zhou
ELM
VLM
LRM
60
1
0
27 Feb 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Dinesh Manocha
MoE
53
0
0
27 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
71
0
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
90
3
0
26 Feb 2025
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Nilay Yilmaz
Maitreya Patel
Yiran Luo
Tejas Gokhale
Chitta Baral
Suren Jayasuriya
Yezhou Yang
LRM
38
0
0
25 Feb 2025
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
Davide Testa
Giovanni Bonetta
Raffaella Bernardi
Alessandro Bondielli
Alessandro Lenci
Alessio Miaschi
Lucia Passaro
Bernardo Magnini
VGen
LRM
50
0
0
24 Feb 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
66
43
0
24 Feb 2025
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
Zhenghao Liu
Xingsheng Zhu
Tianshuo Zhou
Xinyi Zhang
Xiaoyuan Yi
Yukun Yan
Yu Gu
Ge Yu
Maosong Sun
RALM
VLM
43
1
0
24 Feb 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
Xinwei Long
Zhiyuan Ma
Ermo Hua
Kaiyan Zhang
Biqing Qi
Bowen Zhou
RALM
48
0
0
23 Feb 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Yuqing Yang
Ajay Patel
Matt Deitke
Tanmay Gupta
Luca Weihs
...
Mark Yatskar
Chris Callison-Burch
Ranjay Krishna
Aniruddha Kembhavi
Christopher Clark
SyDa
78
2
0
20 Feb 2025
Challenges of Multi-Modal Coreset Selection for Depth Prediction
Viktor Moskvoretskii
Narek Alvandian
44
0
0
20 Feb 2025
Can Hallucination Correction Improve Video-Language Alignment?
Can Hallucination Correction Improve Video-Language Alignment?
Lingjun Zhao
Mingyang Xie
Paola Cascante-Bonilla
Hal Daumé III
Kwonjoon Lee
HILM
VLM
64
0
0
20 Feb 2025
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Rui Zhao
Qirui Yuan
Jinyu Li
Haofeng Hu
Yun Li
Chengyuan Zheng
Fei Gao
LRM
52
4
0
19 Feb 2025
A Comprehensive Survey on Composed Image Retrieval
A Comprehensive Survey on Composed Image Retrieval
Xuemeng Song
Haoqiang Lin
Haokun Wen
Bohan Hou
Mingzhu Xu
Liqiang Nie
53
1
0
19 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Zehan Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Yansen Wang
51
0
0
19 Feb 2025
Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens
Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens
Samuele Bortolotti
Emanuele Marconato
Paolo Morettin
Andrea Passerini
Stefano Teso
61
2
0
16 Feb 2025
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
Aashish Anantha Ramakrishnan
Aadarsh Anantha Ramakrishnan
Dongwon Lee
47
1
0
16 Feb 2025
Visual Graph Question Answering with ASP and LLMs for Language Parsing
Visual Graph Question Answering with ASP and LLMs for Language Parsing
Jakob Johannes Bauer
Thomas Eiter
Nelson Higuera Ruiz
J. Oetsch
GNN
64
0
0
13 Feb 2025
Abduction of Domain Relationships from Data for VQA
Abduction of Domain Relationships from Data for VQA
Al Mehdi Saadat Chowdhury
Paulo Shakarian
Gerardo Simari
85
0
0
13 Feb 2025
KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification
KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification
Yue Zhu
Haiwen Diao
Shang Gao
Long Chen
Huchuan Lu
89
0
0
10 Feb 2025
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
Joshua R. Waite
Md Zahid Hasan
Qisai Liu
Zhanhong Jiang
Chinmay Hegde
S. Sarkar
OffRL
SyDa
186
1
0
31 Jan 2025
Vision-Language Model Selection and Reuse for Downstream Adaptation
Vision-Language Model Selection and Reuse for Downstream Adaptation
Hao-Zhe Tan
Zhi-Hua Zhou
Lan-Zhe Guo
Yu-Feng Li
VLM
95
0
0
30 Jan 2025
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
Adil Kaan Akan
Yucel Yemez
DiffM
OCL
47
0
0
27 Jan 2025
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Hammad A. Ayyubi
Xuande Feng
Junzhang Liu
Xudong Lin
Zhecan Wang
Shih-Fu Chang
50
0
0
24 Jan 2025
Embodied Scene Understanding for Vision Language Models via MetaVQA
Embodied Scene Understanding for Vision Language Models via MetaVQA
Weizhen Wang
Chenda Duan
Zhenghao Peng
Yuxin Liu
Bolei Zhou
LM&Ro
49
0
0
17 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
46
0
0
13 Jan 2025
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
Xiangru Tang
Tianyu Hu
Muyang Ye
Yanjun Shao
Xunjian Yin
...
Pan Lu
Zhuosheng Zhang
Yilun Zhao
Arman Cohan
Mark B. Gerstein
LLMAG
LRM
AI4CE
72
7
0
11 Jan 2025
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
Junmyeong Lee
Eui Jun Hwang
Sukmin Cho
Jong C. Park
54
0
0
06 Jan 2025
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Dapeng Zhao
Yue Qi
3DH
CVBM
3DV
37
6
0
31 Dec 2024
Symbolic Disentangled Representations for Images
Symbolic Disentangled Representations for Images
Alexandr Korchemnyi
A. Kovalev
Aleksandr I. Panov
OCL
51
0
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
64
4
0
31 Dec 2024
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
115
2
0
20 Dec 2024
Relational Programming with Foundation Models
Relational Programming with Foundation Models
Ziyang Li
Jiani Huang
Jason Liu
Felix Zhu
Eric Zhao
William Dodds
Neelay Velingker
Rajeev Alur
Mayur Naik
118
3
0
19 Dec 2024
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Yue Zhang
Liqiang Jing
Vibhav Gogate
116
2
0
19 Dec 2024
A Concept-Centric Approach to Multi-Modality Learning
A Concept-Centric Approach to Multi-Modality Learning
Yuchong Geng
Ao Tang
95
0
0
18 Dec 2024
Video Representation Learning with Joint-Embedding Predictive
  Architectures
Video Representation Learning with Joint-Embedding Predictive Architectures
Katrina Drozdov
Ravid Shwartz-Ziv
Yann LeCun
AI4TS
82
2
0
14 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
90
5
0
08 Dec 2024
MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language
  Models
MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language Models
Ming-Chang Chiu
Shicheng Wen
Pin-Yu Chen
Xuezhe Ma
82
0
0
05 Dec 2024
Relations, Negations, and Numbers: Looking for Logic in Generative
  Text-to-Image Models
Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models
C. Conwell
Rupert Tawiah-Quashie
T. Ullman
74
2
0
26 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
101
8
0
25 Nov 2024
Previous
12345...282930
Next