ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 909 papers shown
Title
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Aishwarya Venkataramanan
P. Bodesheim
Joachim Denzler
BDL
VLM
49
0
0
08 May 2025
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
Jinpeng Chen
Runmin Cong
Yuzhi Zhao
Hongzheng Yang
Guangneng Hu
H. Ip
Sam Kwong
CLL
KELM
59
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
55
0
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
Rethinking Visual Layer Selection in Multimodal LLMs
Rethinking Visual Layer Selection in Multimodal LLMs
H. Chen
Junyan Lin
Xinhao Chen
Yue Fan
Xin Jin
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
88
0
0
30 Apr 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
15
0
0
30 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Z. Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
89
0
0
28 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
L. Varshney
52
0
0
27 Apr 2025
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang
Wenliang Zheng
Aashrith Madasu
Peng Shi
Ryo Kamoi
...
Ranran Haoran Zhang
Avitej Iyer
Renze Lou
Wenpeng Yin
Rui Zhang
63
0
0
25 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Z. Wang
Senthil Purushwalkam
Caiming Xiong
S.
Heng Ji
R. Xu
28
0
0
23 Apr 2025
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π0.5π_{0.5}π0.5​: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence
Kevin Black
Noah Brown
James Darpinian
Karan Dhabalia
...
Homer Walke
Anna Walling
Haohuan Wang
Lili Yu
Ury Zhilinsky
LM&Ro
VLM
31
5
0
22 Apr 2025
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
Atin Pothiraj
Elias Stengel-Eskin
Jaemin Cho
Mohit Bansal
30
0
0
21 Apr 2025
Towards Understanding Camera Motions in Any Video
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin
Siyuan Cen
Daniel Jiang
Jay Karhade
Hewei Wang
...
Rushikesh Zawar
Xue Bai
Yilun Du
Chuang Gan
Deva Ramanan
VGen
21
0
0
21 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
28
1
0
20 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
98
0
0
17 Apr 2025
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Shravan Chaudhari
Trilokya Akula
Yoon Kim
Tom Blake
LRM
35
0
0
16 Apr 2025
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Mohammad Saleha
Azadeh Tabatabaeib
46
0
0
14 Apr 2025
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li
Xingxuan Zhang
Hao Zou
Yige Guo
Renzhe Xu
Yilong Liu
Chuzhao Zhu
Yue He
Peng Cui
VLM
32
0
0
14 Apr 2025
AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark
AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark
Aruna Gauba
Irene Pi
Yunze Man
Ziqi Pang
Vikram S. Adve
Yu-Xiong Wang
28
0
0
14 Apr 2025
MIEB: Massive Image Embedding Benchmark
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
K. Enevoldsen
Niklas Muennighoff
VLM
35
0
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
63
6
1
14 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
21
1
0
14 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
30
0
0
12 Apr 2025
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
Qi Zhi Lim
C. Lee
K. Lim
Kalaiarasi Sonai Muthu Anbananthen
23
0
0
11 Apr 2025
Mimic In-Context Learning for Multimodal Tasks
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang
Jiale Fu
Chenduo Hao
Xinting Hu
Yingzhe Peng
Xin Geng
Xu Yang
24
0
0
11 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
27
0
0
10 Apr 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
34
0
0
09 Apr 2025
Resource-efficient Inference with Foundation Model Programs
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie
Zhimin Ding
Kevin Yu
Marco Cheung
C. Jermaine
S. Chaudhuri
14
0
0
09 Apr 2025
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models
Xiangxi Zheng
Linjie Li
Z. Yang
Ping Yu
Alex Jinpeng Wang
Rui Yan
Yuan Yao
Lijuan Wang
LRM
16
0
0
08 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
31
4
0
07 Apr 2025
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
Yimu Wang
Mozhgan Nasr Azadani
Sean Sedwards
Krzysztof Czarnecki
MLLM
MoE
47
0
0
07 Apr 2025
M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models
M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models
Yanshu Li
Hongyang He
Yi Cao
Qisen Cheng
Xiang Fu
Ruixiang Tang
VLM
30
0
0
06 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Window Token Concatenation for Efficient Visual Large Language Models
Yifan Li
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
34
0
0
05 Apr 2025
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
Kexin Tian
Jingrui Mao
Y. Zhang
Jiwan Jiang
Yang Zhou
Zhengzhong Tu
CoGe
60
0
0
04 Apr 2025
Locations of Characters in Narratives: Andersen and Persuasion Datasets
Locations of Characters in Narratives: Andersen and Persuasion Datasets
Batuhan Ozyurt
Roya Arkhmammadova
Deniz Yuret
22
0
0
04 Apr 2025
QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
Quanxing Xu
Ling Zhou
X. Zhong
Feifei Zhang
Rubing Huang
Chia-Wen Lin
24
0
0
04 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
56
1
0
03 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
61
0
0
03 Apr 2025
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang
Yushen Zuo
Yuanjun Chai
Z. Liu
Yichen Fu
Yichun Feng
Kin-Man Lam
AAML
VLM
34
0
0
02 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
64
1
0
02 Apr 2025
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
Zhaochen Wang
Yujun Cai
Zi Huang
Bryan Hooi
Yiwei Wang
Ming Yang
CoGe
VLM
67
0
0
02 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
37
0
0
01 Apr 2025
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
Shuai Li
Jian Xu
Xiao-Hui Li
Chao Deng
Lin-Lin Huang
MQ
38
0
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
56
2
0
01 Apr 2025
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
Jie Ma
Zhitao Gao
Qi Chai
J. Liu
P. Wang
Jing Tao
Zhou Su
42
0
0
01 Apr 2025
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang
H. Wang
Mingshuo Chen
Di Wang
Yulin Wang
...
L. Lan
Wenjing Yang
J. Zhang
Zhiyuan Liu
Maosong Sun
52
2
0
31 Mar 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
31
0
0
30 Mar 2025
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin
Yannic Neuhaus
Matthias Hein
VLM
34
1
0
30 Mar 2025
InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding
InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding
Anastasiia Fadeeva
Vincent Coriou
Diego Antognini
C. Musat
Andrii Maksai
40
0
0
29 Mar 2025
1234...171819
Next