ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.00377
  4. Cited By
Don't Just Assume; Look and Answer: Overcoming Priors for Visual
  Question Answering

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

1 December 2017
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
    OOD
ArXivPDFHTML

Papers citing "Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering"

50 / 330 papers shown
Title
QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
Quanxing Xu
Ling Zhou
X. Zhong
Feifei Zhang
Rubing Huang
Chia-Wen Lin
34
0
0
04 Apr 2025
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
Jie Ma
Zhitao Gao
Qi Chai
J. Liu
P. Wang
Jing Tao
Zhou Su
45
0
0
01 Apr 2025
TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models
Gokul Puthumanaillam
Paulo Padrão
Jose Fuentes
Pranay Thangeda
William E. Schafer
Jae Hyuk Song
Karan Jagdale
Leonardo Bobadilla
Melkior Ornik
36
0
0
02 Mar 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Z. Wang
Yang Zhou
ELM
VLM
LRM
53
1
0
27 Feb 2025
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Chengyue Huang
Junjiao Tian
Brisa Maneechotesuwan
Shivang Chopra
Z. Kira
49
0
0
21 Feb 2025
MASS: Overcoming Language Bias in Image-Text Matching
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
30
0
0
20 Jan 2025
Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation
Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation
Daowan Peng
Wei Wei
49
0
0
10 Jan 2025
What makes a good metric? Evaluating automatic metrics for text-to-image
  consistency
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross
Melissa Hall
Adriana Romero Soriano
Adina Williams
88
3
0
18 Dec 2024
Natural Language Understanding and Inference with MLLM in Visual
  Question Answering: A Survey
Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey
Jiayi Kuang
Jingyou Xie
Haohao Luo
Ronghao Li
Zhe Xu
Xianfeng Cheng
Yinghui Li
Xika Lin
Ying Shen
LRM
85
2
0
26 Nov 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
45
5
0
28 Oct 2024
Eliminating the Language Bias for Visual Question Answering with
  fine-grained Causal Intervention
Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention
Ying Liu
Ge Bai
Chenji Lu
Shilong Li
Zhang Zhang
Ruifang Liu
Wenbin Guo
13
0
0
14 Oct 2024
Unleashing the Potentials of Likelihood Composition for Multi-modal
  Language Models
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
Shitian Zhao
Renrui Zhang
Xu Luo
Yan Wang
Shanghang Zhang
Peng Gao
18
0
0
01 Oct 2024
Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut
  Learning in Text Classification by Language Models
Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut Learning in Text Classification by Language Models
Yuqing Zhou
Ruixiang Tang
Ziyu Yao
Ziwei Zhu
21
2
0
26 Sep 2024
Fairness and Bias Mitigation in Computer Vision: A Survey
Fairness and Bias Mitigation in Computer Vision: A Survey
Sepehr Dehdashtian
Ruozhen He
Yi Li
Guha Balakrishnan
Nuno Vasconcelos
Vicente Ordonez
Vishnu Naresh Boddeti
29
4
0
05 Aug 2024
VolDoGer: LLM-assisted Datasets for Domain Generalization in
  Vision-Language Tasks
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
Juhwan Choi
Junehyoung Kwon
Jungmin Yun
Seunguk Yu
Youngbin Kim
36
0
0
29 Jul 2024
What does Kiki look like? Cross-modal associations between speech sounds
  and visual shapes in vision-and-language models
What does Kiki look like? Cross-modal associations between speech sounds and visual shapes in vision-and-language models
Tessa Verhoef
Kiana Shahrasbi
Tom Kouwenhoven
VLM
16
2
0
25 Jul 2024
Unveiling and Mitigating Bias in Audio Visual Segmentation
Unveiling and Mitigating Bias in Audio Visual Segmentation
Peiwen Sun
Honggang Zhang
Di Hu
16
3
0
23 Jul 2024
Position: Measure Dataset Diversity, Don't Just Claim It
Position: Measure Dataset Diversity, Don't Just Claim It
Dora Zhao
Jerone T. A. Andrews
Orestis Papakyriakopoulos
Alice Xiang
64
14
0
11 Jul 2024
On the Role of Visual Grounding in VQA
On the Role of Visual Grounding in VQA
Daniel Reich
Tanja Schultz
16
1
0
26 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
37
278
0
24 Jun 2024
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
Kang-il Lee
Minbeom Kim
Seunghyun Yoon
Minsung Kim
Dongryeol Lee
Hyukhun Koh
Kyomin Jung
CoGe
VLM
69
5
0
13 Jun 2024
Evaluating Vision-Language Models on Bistable Images
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
36
0
0
29 May 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
115
13
0
25 Apr 2024
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning
Yian Li
Wentao Tian
Yang Jiao
Jingjing Chen
Yueping Jiang
Bin Zhu
Na Zhao
Yu-Gang Jiang
LRM
30
9
0
19 Apr 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
30
4
0
18 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question
  Answering
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
33
1
0
01 Apr 2024
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language
  Models: A Causal Perspective
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
Meiqi Chen
Yixin Cao
Yan Zhang
Chaochao Lu
16
3
0
27 Mar 2024
Intrinsic Subgraph Generation for Interpretable Graph based Visual
  Question Answering
Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering
Pascal Tilli
Ngoc Thang Vu
21
0
0
26 Mar 2024
MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
Taeheon Kim
Sangyun Chung
Damin Yeom
Youngjoon Yu
Hak Gu Kim
Y. Ro
30
2
0
22 Mar 2024
Lost in Translation? Translation Errors and Challenges for Fair
  Assessment of Text-to-Image Models on Multilingual Concepts
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Michael Stephen Saxon
Yiran Luo
Sharon Levy
Chitta Baral
Yezhou Yang
William Yang Wang
EGVM
25
3
0
17 Mar 2024
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in
  Visual Question Answering
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
Jihyung Kil
Farideh Tavazoee
Dongyeop Kang
Joo-Kyung Kim
LRM
25
2
0
16 Feb 2024
Improving Data Augmentation for Robust Visual Question Answering with
  Effective Curriculum Learning
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
Yuhang Zheng
Zhen Wang
Long Chen
8
2
0
28 Jan 2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
Siwei Wu
Yizhi Li
Kang Zhu
Ge Zhang
Yiming Liang
...
Wenhu Chen
Wenhao Huang
Noura Al Moubayed
Jie Fu
Chenghua Lin
22
11
0
24 Jan 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining
  Question-Answer Prompts for VQA requiring Diverse World Knowledge
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
14
3
0
19 Jan 2024
Uncovering the Full Potential of Visual Grounding Methods in VQA
Uncovering the Full Potential of Visual Grounding Methods in VQA
Daniel Reich
Tanja Schultz
20
1
0
15 Jan 2024
Understanding Unimodal Bias in Multimodal Deep Linear Networks
Understanding Unimodal Bias in Multimodal Deep Linear Networks
Yedi Zhang
Peter E. Latham
Andrew Saxe
15
5
0
01 Dec 2023
Debiasing Multimodal Models via Causal Information Minimization
Debiasing Multimodal Models via Causal Information Minimization
Vaidehi Patil
A. Maharana
Mohit Bansal
CML
14
0
0
28 Nov 2023
The curse of language biases in remote sensing VQA: the role of spatial
  attributes, language diversity, and the need for clear evaluation
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation
Christel Chappuis
Eliot Walt
Vincent Mendez
Sylvain Lobry
B. L. Saux
D. Tuia
15
3
0
28 Nov 2023
Improving Zero-shot Visual Question Answering via Large Language Models
  with Reasoning Question Prompts
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
Yunshi Lan
Xiang Li
Xin Liu
Yang Li
Wei Qin
Weining Qian
LRM
ReLM
17
23
0
15 Nov 2023
VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization
VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization
Suraj Jyothi Unni
Raha Moraffah
Huan Liu
30
2
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
35
35
0
01 Nov 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
  Beyond
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
10
3
0
23 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of
  Multi-modal Large Models
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
12
5
0
17 Oct 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for
  Unbiased Question-Answering
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
Xiulong Liu
Zhikang Dong
Peng Zhang
14
21
0
10 Oct 2023
Causal Reasoning through Two Layers of Cognition for Improving
  Generalization in Visual Question Answering
Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering
Trang Nguyen
Naoaki Okazaki
LRM
25
0
0
09 Oct 2023
Negative Object Presence Evaluation (NOPE) to Measure Object
  Hallucination in Vision-Language Models
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models
Holy Lovenia
Wenliang Dai
Samuel Cahyawijaya
Ziwei Ji
Pascale Fung
MLLM
14
46
0
09 Oct 2023
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based
  Question Answering
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
Bruno Souza
Marius Aasan
Hélio Pedrini
Adín Ramirez Rivera
SSL
10
1
0
03 Oct 2023
D3: Data Diversity Design for Systematic Generalization in Visual
  Question Answering
D3: Data Diversity Design for Systematic Generalization in Visual Question Answering
Amir Rahimi
Vanessa D’Amario
Moyuru Yamada
Kentaro Takemoto
Tomotake Sasaki
Xavier Boix
17
1
0
15 Sep 2023
Beyond Generation: Harnessing Text to Image Models for Object Detection
  and Segmentation
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
Yunhao Ge
Jiashu Xu
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
28
14
0
12 Sep 2023
Interpretable Visual Question Answering via Reasoning Supervision
Interpretable Visual Question Answering via Reasoning Supervision
Maria Parelli
Dimitrios Mallis
Markos Diomataris
Vassilis Pitsikalis
LRM
20
2
0
07 Sep 2023
1234567
Next