Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1612.00837
Cited By
v1
v2
v3 (latest)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"
50 / 2,271 papers shown
Title
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
Divya J. Bajpai
M. Hanawal
68
0
0
28 Sep 2025
AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
Junyang Zhang
Tianyi Zhu
Thierry Tambe
48
0
0
27 Sep 2025
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
Divyam Madaan
Varshan Muhunthan
Kyunghyun Cho
S. Chopra
101
1
0
27 Sep 2025
Uncovering Intrinsic Capabilities: A Paradigm for Data Curation in Vision-Language Models
Junjie Li
Ziao Wang
Jianghong Ma
Xiaofeng Zhang
100
0
0
27 Sep 2025
Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection
Mingfei Han
Haihong Hao
Jinxing Zhou
Zhihui Li
Yuhui Zheng
XueQing Deng
Linjie Yang
Xiaojun Chang
HILM
VLM
108
0
0
27 Sep 2025
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
Bo Li
Guanzhi Deng
Ronghao Chen
Junrong Yue
Shuo Zhang
Qinghua Zhao
Linqi Song
Lijie Wen
LRM
85
0
0
26 Sep 2025
Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding
Ziheng Chi
Yifan Hou
Chenxi Pang
Shaobo Cui
Mubashara Akhtar
Mrinmaya Sachan
107
0
0
26 Sep 2025
Instruction-tuned Self-Questioning Framework for Multimodal Reasoning
You-Won Jang
Y. Heo
J. Kim
Minsu Lee
Du-Seong Chang
Byoung-Tak Zhang
LRM
56
0
0
25 Sep 2025
DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning
Tianrun Xu
Haoda Jing
Y. Li
Yuquan Wei
Jun Feng
Guanyu Chen
Haichuan Gao
Tianren Zhang
Feng Chen
OffRL
71
0
0
25 Sep 2025
SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering
Yan Zhang
Jiaqing Lin
Miao Zhang
Kui Xiao
Xiaoju Hou
Yue Zhao
Ruoyao Xiao
94
0
0
25 Sep 2025
Integrating Object Interaction Self-Attention and GAN-Based Debiasing for Visual Question Answering
Zhifei Li
Feng Qiu
Yiran Wang
Yujing Xia
Kui Xiao
Miao Zhang
Yan Zhang
124
0
0
25 Sep 2025
ColorBlindnessEval: Can Vision-Language Models Pass Color Blindness Tests?
Zijian Ling
Han Zhang
Yazhuo Zhou
Jiahao Cui
VLM
66
2
0
23 Sep 2025
Training-Free Label Space Alignment for Universal Domain Adaptation
Dujin Lee
Sojung An
Jungmyung Wi
Kuniaki Saito
Donghyun Kim
VLM
160
0
0
22 Sep 2025
SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
Pingyi Chen
Yujing Lou
Shen Cao
Jinhui Guo
Lubin Fan
Yue-bo Wu
Lin Yang
Lizhuang Ma
Jieping Ye
96
3
0
22 Sep 2025
Decoupled Proxy Alignment: Mitigating Language Prior Conflict for Multimodal Alignment in MLLM
Chenkun Tan
Pengyu Wang
Shaojun Zhou
Botian Jiang
Zhaowei Li
Dong Zhang
Xinghao Wang
Yaqian Zhou
Xipeng Qiu
108
0
0
18 Sep 2025
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Peng Xu
Shengwu Xiong
Jiajun Zhang
Yaxiong Chen
Bowen Zhou
...
Yang Yang
Yanglin Deng
Yashu Kang
Ye Yuan
Y. Wen
LRM
103
1
0
17 Sep 2025
Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration
Yuanchen Wu
Ke Yan
Shouhong Ding
Ziyin Zhou
Xiaoqiang Li
LRM
88
0
0
17 Sep 2025
Pre-Manipulation Alignment Prediction with Parallel Deep State-Space and Transformer Models
Motonari Kambara
Komei Sugiura
104
0
0
17 Sep 2025
SAIL-VL2 Technical Report
Weijie Yin
Yongjie Ye
Fangxun Shu
Yue Liao
Zijian Kang
...
Han Wang
Wenzhuo Liu
Xiao Liang
Shuicheng Yan
Chao Feng
LRM
VLM
252
3
0
17 Sep 2025
A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
George Correa de Araujo
H. Maia
Hélio Pedrini
88
0
0
17 Sep 2025
HERO: Rethinking Visual Token Early Dropping in High-Resolution Large Vision-Language Models
Xu Li
Yuxuan Liang
Xiaolei Chen
Yi Zheng
Haotian Chen
Bin Li
Xiangyang Xue
VLM
165
0
0
16 Sep 2025
Enhancing Video Large Language Models with Structured Multi-Video Collaborative Reasoning
Zhihao He
Tianyao He
Yun Xu
Yun Xu
Huabin Liu
Chaofan Gan
Gui Zou
W. Lin
124
1
0
16 Sep 2025
Sparse Training Scheme for Multimodal LLM
Kean Shi
Liang Chen
Haozhe Zhao
Baobao Chang
80
0
0
16 Sep 2025
Do Natural Language Descriptions of Model Activations Convey Privileged Information?
Millicent Li
Alberto Mario Ceballos Arroyo
Giordano Rogers
Naomi Saphra
Byron C. Wallace
140
2
0
16 Sep 2025
Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models
Yan Chen
Long Li
Teng Xi
Long Zeng
Jingdong Wang
OffRL
ReLM
LRM
VLM
160
6
0
16 Sep 2025
FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning
Haodong Chen
Haojian Huang
XinXiang Yin
Dian Shao
LRM
127
2
0
15 Sep 2025
Igniting VLMs toward the Embodied Space
Andy Zhai
B. Liu
Bruno Fang
Chalse Cai
Ellie Ma
...
Shalfun Li
Starrick Liu
S. Chen
Vincent Chen
Zach Xu
LM&Ro
VLM
147
7
0
15 Sep 2025
Seeing is Not Understanding: A Benchmark on Perception-Cognition Disparities in Large Language Models
Haokun Li
Yazhou Zhang
Jizhi Ding
Qiuchi Li
Peng Zhang
83
0
0
14 Sep 2025
InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning
Gautam Sreekumar
Vishnu Boddeti
ReLM
LRM
101
0
0
12 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
264
2
0
12 Sep 2025
Test-Time Warmup for Multimodal Large Language Models
Nikita Rajaneesh
Thomas P. Zollo
R. Zemel
MLLM
VLM
LRM
155
0
0
12 Sep 2025
Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
Zhengzhao Lai
Youbin Zheng
Zhenyang Cai
Haonan Lyu
Jinpu Yang
Hongqing Liang
Yan Hu
Benyou Wang
80
0
0
11 Sep 2025
BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion
Sike Xiang
Shuang Chen
Amir Atapour-Abarghouei
MLLM
98
0
0
10 Sep 2025
Towards Meta-Cognitive Knowledge Editing for Multimodal LLMs
Zhaoyu Fan
Kaihang Pan
Mingze Zhou
Bosheng Qin
Juncheng Billy Li
Shengyu Zhang
Wenqiao Zhang
Siliang Tang
Fei Wu
Yueting Zhuang
KELM
108
0
0
06 Sep 2025
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
Gagan Mundada
Yash Vishe
Amit Namburi
Xin Xu
Zachary Novack
Julian McAuley
Junda Wu
LRM
76
2
0
05 Sep 2025
Prior Distribution and Model Confidence
Maksim Kazanskii
Artem Kasianov
UQCV
199
0
0
05 Sep 2025
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
Han Li
Xinyu Peng
Y. Wang
Zelin Peng
Xin Chen
Rongxiang Weng
Jingang Wang
Xunliang Cai
Wenrui Dai
Hongkai Xiong
MLLM
OffRL
278
10
0
03 Sep 2025
Understanding Space Is Rocket Science -- Only Top Reasoning Models Can Solve Spatial Understanding Tasks
Nils Hoehing
Mayug Maniparambil
Ellen Rushe
Noel E. O'Connor
Anthony Ventresque
LRM
129
0
0
02 Sep 2025
GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping
Qifu Wen
Xi Zeng
Zihan Zhou
Shuaijun Liu
M. Hosseinzadeh
Ningxin Su
Reza Rawassizadeh
223
0
0
01 Sep 2025
Improving Alignment in LVLMs with Debiased Self-Judgment
Sihan Yang
Chenhang Cui
Zihao Zhao
Yiyang Zhou
Weilong Yan
Ying Wei
Huaxiu Yao
185
0
0
28 Aug 2025
Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement
Mohammed Rakibul Hasan
Rafi Majid
Ahanaf Tahmid
MLLM
39
0
0
27 Aug 2025
How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
Zhuoran Yu
Yong Jae Lee
LRM
72
2
0
27 Aug 2025
Tailored Teaching with Balanced Difficulty: Elevating Reasoning in Multimodal Chain-of-Thought via Prompt Curriculum
Xinglong Yang
Quan Feng
Zhongying Pan
Xiang Chen
Yu Tian
Wentong Li
Shuofei Qiao
Yuxia Geng
Xingyu Zhao
Sheng-Jun Huang
LRM
64
0
0
26 Aug 2025
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
Sixun Dong
Juhua Hu
Mian Zhang
Ming Yin
Yanjie Fu
Qi Qian
68
4
0
25 Aug 2025
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
Zhenwei Tang
Difan Jiao
Blair Yang
Ashton Anderson
VLM
CoGe
118
1
0
25 Aug 2025
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference
Pengfei Jiang
Hanjun Li
Linglan Zhao
Fei Chao
Ke Yan
Shouhong Ding
Rongrong Ji
92
2
0
25 Aug 2025
From Global to Local: Social Bias Transfer in CLIP
Ryan Ramos
Yusuke Hirota
Yuta Nakashima
Noa Garcia
84
0
0
25 Aug 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLM
CoGe
LRM
296
6
0
24 Aug 2025
PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science
Syed Nazmus Sakib
Nafiul Haque
Mohammad Zabed Hossain
Shifat E. Arman
104
0
0
23 Aug 2025
Can VLMs Recall Factual Associations From Visual References?
Dhananjay Ashok
Ashutosh Chaubey
Hirona J. Arai
Jonathan May
Jesse Thomason
64
0
0
22 Aug 2025
Previous
1
2
3
4
5
6
...
44
45
46
Next