Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1606.05433
Cited By
v1
v2
v3
v4 (latest)
FVQA: Fact-based Visual Question Answering
17 June 2016
Peng Wang
Qi Wu
Chunhua Shen
Anton van den Hengel
A. Dick
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FVQA: Fact-based Visual Question Answering"
50 / 241 papers shown
CauSight: Learning to Supersense for Visual Causal Discovery
Yize Zhang
M. Chen
Sirui Chen
Bo Peng
Y. Zhang
Tianyu Li
Chaochao Lu
CML
ReLM
LRM
146
0
0
01 Dec 2025
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
Dosung Lee
Sangwon Jung
Boyoung Kim
Minyoung Kim
Sungyeon Kim
Junyoung Sung
Paul Hongsuck Seo
129
0
0
28 Nov 2025
Revisiting KRISP: A Lightweight Reproduction and Analysis of Knowledge-Enhanced Vision-Language Models
Souradeep Dutta
Keshav Bulia
Neena S Nair
VLM
153
0
0
25 Nov 2025
Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding
Da Li
Yuxiao Luo
Keping Bi
Jiafeng Guo
Wei Yuan
B. Yang
Yan Wang
Fan Yang
Tingting Gao
Guorui Zhou
VLM
253
0
0
11 Nov 2025
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
Yuyang Hong
Jiaqi Gu
Qi Yang
Lubin Fan
Yue-bo Wu
Ying Wang
Kun Ding
Shiming Xiang
Jieping Ye
202
3
0
16 Oct 2025
Implicit-Knowledge Visual Question Answering with Structured Reasoning Traces
Zhihao Wen
Wenkang Wei
Yuan Fang
Xingtong Yu
Hui Zhang
Weicheng Zhu
X. Zhang
ReLM
LRM
VLM
169
0
0
08 Oct 2025
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
Honghao Chen
Xingzhou Lou
Xiaokun Feng
Kaiqi Huang
Xinlong Wang
OffRL
LRM
185
1
0
23 Sep 2025
NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks
Aritra Dutta
Swapnanil Mukherjee
Deepanway Ghosal
Somak Aditya
VLM
95
0
0
27 Aug 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLM
CoGe
LRM
357
9
0
24 Aug 2025
Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation
Y. X. R. Wang
Yuming Qiao
Dan Meng
Jun Yang
H. Lu
Zhenyu Yang
Xudong Zhang
77
0
0
12 Aug 2025
ViFP: A Framework for Visual False Positive Detection to Enhance Reasoning Reliability in VLMs
Ben Zhang
LuLu Yu
Lei Gao
QuanJiang Guo
QuanJiang Guo
Hui Gao
LRM
168
0
0
06 Aug 2025
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Yanming Xiu
M. Gorlatova
404
6
0
27 Jul 2025
Augmented Vision-Language Models: A Systematic Review
Anthony C Davis
Burhan Sadiq
Tianmin Shu
Chien-Ming Huang
VLM
LRM
196
0
0
24 Jul 2025
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
Duong T. Tran
T. Tran
M. Hauswirth
Danh Le-Phuoc
189
2
0
22 Jul 2025
Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos
Benjamin Z. Reichman
Constantin Patsch
Jack Truxal
Atishay Jain
Larry Heck
213
0
0
11 Jun 2025
mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs
Chuan Xu
Qiaosheng Chen
Yutong Feng
Gong Cheng
RALM
3DV
VLM
190
1
0
16 May 2025
A Survey of Task-Oriented Knowledge Graph Reasoning: Status, Applications, and Prospects
Guanglin Niu
Bo Li
Yangguang Lin
LRM
273
2
0
27 Apr 2025
Seeking and Updating with Live Visual Knowledge
Mingyang Fu
Yuyang Peng
Benlin Liu
Zetong Zhou
Benlin Liu
Yao Wan
Zhou Zhao
Philip S. Yu
Ranjay Krishna
289
2
0
07 Apr 2025
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models
Meng Cao
Pengfei Hu
Yuhang Han
J. Gu
Haoran Tang
...
Jun Song
Xiang Li
Bo Zheng
Ian Reid
Xiaodan Liang
203
8
0
24 Mar 2025
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Ding Wang
Botian Shi
369
12
0
17 Mar 2025
Abduction of Domain Relationships from Data for VQA
International Conference on Logic Programming (ICLP), 2025
Al Mehdi Saadat Chowdhury
Paulo Shakarian
Gerardo Simari
296
0
0
13 Feb 2025
Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering
Qian Tao
Xiaoyang Fan
Yong Xu
Xingquan Zhu
Yufei Tang
229
0
0
22 Jan 2025
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
287
8
0
17 Nov 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
233
0
0
30 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
322
10
0
28 Oct 2024
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
A. S. Penamakuri
Anand Mishra
330
2
0
24 Oct 2024
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
LM&Ro
203
1
0
17 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Computer Vision and Pattern Recognition (CVPR), 2024
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
314
11
0
14 Oct 2024
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Arpan Phukan
Manish Gupta
Asif Ekbal
VGen
197
2
0
13 Oct 2024
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sourjyadip Ray
Kushal Gupta
Soumi Kundu
Payal Arvind Kasat
Somak Aditya
Pawan Goyal
124
4
0
08 Oct 2024
What Makes a Maze Look Like a Maze?
International Conference on Learning Representations (ICLR), 2024
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
480
12
0
12 Sep 2024
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
307
43
0
28 Aug 2024
IIU: Independent Inference Units for Knowledge-based Visual Question Answering
Knowledge Science, Engineering and Management (KSEM), 2024
Yili Li
Jing Yu
Keke Gai
Gang Xiong
154
1
0
15 Aug 2024
Towards Flexible Evaluation for Generative Visual Question Answering
ACM Multimedia (MM), 2024
Huishan Ji
Q. Si
Zheng Lin
Weiping Wang
227
2
0
01 Aug 2024
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
To Eun Kim
Alireza Salemi
Andrew Drozdov
Fernando Diaz
Hamed Zamani
371
10
0
17 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
213
1
0
06 Jul 2024
CELLO: Causal Evaluation of Large Vision-Language Models
Meiqi Chen
Bo Peng
Yan Zhang
Chaochao Lu
LRM
ELM
242
7
0
27 Jun 2024
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA
Elham J. Barezi
Parisa Kordjamshidi
CoGe
164
2
0
27 Jun 2024
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Rushikesh Zawar
Shaurya Dewan
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Leila Wehbe
VGen
CoGe
186
1
0
19 Jun 2024
Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models
ICON (ICON), 2024
Manas Jhalani
Annervaz K M
Pushpak Bhattacharyya
98
3
0
14 Jun 2024
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
International Conference on Learning Representations (ICLR), 2024
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
Bang Liu
Yoshua Bengio
265
5
0
10 Jun 2024
Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
Wenliang Zhong
Wenyi Wu
Qi Li
Rob Barton
Boxin Du
Shioulin Sam
Karim Bouyarmane
Ismail B. Tutar
Junzhou Huang
231
4
0
05 Jun 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Computer Vision and Pattern Recognition (CVPR), 2024
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRM
RALM
273
32
0
15 May 2024
Knowledge-aware Text-Image Retrieval for Remote Sensing Images
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Li Mi
Xianjie Dai
J. Castillo-Navarro
D. Tuia
191
21
0
06 May 2024
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
Dongze Hao
Qunbo Wang
Longteng Guo
Jie Jiang
Jing Liu
306
9
0
22 Apr 2024
Find The Gap: Knowledge Base Reasoning For Visual Question Answering
Elham J. Barezi
Parisa Kordjamshidi
212
3
0
16 Apr 2024
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
Meiqi Chen
Yixin Cao
Yan Zhang
Chaochao Lu
437
34
0
27 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
803
707
0
21 Mar 2024
Knowledge Condensation and Reasoning for Knowledge-based VQA
Dongze Hao
Jian Jia
Longteng Guo
Qunbo Wang
Te Yang
...
Yanhua Cheng
Bo Wang
Quan Chen
Han Li
Jing Liu
186
3
0
15 Mar 2024
Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Bingqian Lin
Yanxin Long
Yi Zhu
Fengda Zhu
Xiaodan Liang
QiXiang Ye
Liang Lin
234
7
0
09 Mar 2024
1
2
3
4
5
Next