Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00837
Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"
50 / 1,956 papers shown
Title
Learnable In-Context Vector for Visual Question Answering
Yingzhe Peng
Chenduo Hao
Xu Yang
Jiawei Peng
Xinting Hu
Xin Geng
37
4
0
19 Jun 2024
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Bingchen Zhao
Yongshuo Zong
Letian Zhang
Timothy Hospedales
VLM
28
15
0
18 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
27
22
0
18 Jun 2024
Unveiling Encoder-Free Vision-Language Models
Haiwen Diao
Yufeng Cui
Xiaotong Li
Yueze Wang
Huchuan Lu
Xinlong Wang
VLM
43
28
0
17 Jun 2024
Improving Multi-Agent Debate with Sparse Communication Topology
Yunxuan Li
Yibing Du
Jiageng Zhang
Le Hou
Peter Grabowski
Yeqing Li
Eugene Ie
LLMAG
31
18
0
17 Jun 2024
Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression
Zilun Zhang
Yutao Sun
Tiancheng Zhao
Leigang Sha
Ruochen Xu
Kyusong Lee
Jianwei Yin
CLL
KELM
46
0
0
17 Jun 2024
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Jiahao Huo
Yibo Yan
Boren Hu
Yutao Yue
Xuming Hu
LRM
MLLM
32
7
0
17 Jun 2024
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
CoGe
30
9
0
17 Jun 2024
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram
Elias Stengel-Eskin
Carl Vondrick
Mohit Bansal
VLM
32
7
0
17 Jun 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
58
3
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
75
23
0
17 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
43
26
0
16 Jun 2024
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture
Wenyan Li
Xinyu Crystina Zhang
Jiaang Li
Qiwei Peng
Raphael Tang
...
Guimin Hu
Yifei Yuan
Anders Søgaard
Daniel Hershcovich
Desmond Elliott
CoGe
22
7
0
16 Jun 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
33
8
0
16 Jun 2024
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags
Daiqing Qi
Handong Zhao
Zijun Wei
Sheng Li
35
2
0
16 Jun 2024
Mixture-of-Subspaces in Low-Rank Adaptation
Taiqiang Wu
Jiahao Wang
Zhe Zhao
Ngai Wong
41
21
0
16 Jun 2024
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
Wentao Yuan
Jiafei Duan
Valts Blukis
Wilbert Pumacay
Ranjay Krishna
Adithyavairavan Murali
Arsalan Mousavian
Dieter Fox
LM&Ro
42
45
0
15 Jun 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLM
VGen
43
6
0
15 Jun 2024
Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps
Jian Chen
Peilin Zhou
Yining Hua
Dading Chong
Meng Cao
Yaowei Li
Zixuan Yuan
Bing Zhu
Junwei Liang
VLM
33
1
0
14 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
45
1
0
13 Jun 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang
Xingyu Fu
James Y. Huang
Zekun Li
Qin Liu
...
Kai-Wei Chang
Dan Roth
Sheng Zhang
Hoifung Poon
Muhao Chen
VLM
36
47
0
13 Jun 2024
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
Yuhang Wu
Wenmeng Yu
Yean Cheng
Yan Wang
Xiaohan Zhang
Jiazheng Xu
Ming Ding
Yuxiao Dong
48
1
0
13 Jun 2024
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim
Karl Pertsch
Siddharth Karamcheti
Ted Xiao
Ashwin Balakrishna
...
Russ Tedrake
Dorsa Sadigh
Sergey Levine
Percy Liang
Chelsea Finn
LM&Ro
VLM
37
360
0
13 Jun 2024
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
46
4
0
13 Jun 2024
ReMI: A Dataset for Reasoning with Multiple Images
Mehran Kazemi
Nishanth Dikkala
Ankit Anand
Petar Dević
Ishita Dasgupta
...
Bahare Fatemi
Pranjal Awasthi
Dee Guo
Sreenivas Gollapudi
Ahmed Qureshi
LRM
VLM
34
13
0
13 Jun 2024
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
Hanqing Wang
Zeguan Xiao
Shuo Wang
Guanhua Chen
Guanhua Chen
30
19
0
13 Jun 2024
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Rithesh Murthy
Liangwei Yang
Juntao Tan
Tulika Awalgaonkar
Yilun Zhou
...
Zuxin Liu
Ming Zhu
Huan Wang
Caiming Xiong
Silvio Savarese
57
5
0
12 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
L. Wang
Rong Jin
34
40
0
12 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLM
OffRL
51
20
0
12 Jun 2024
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Shimin Chen
Yitian Yuan
Shaoxiang Chen
Zequn Jie
Lin Ma
VLM
27
3
0
12 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLM
CLIP
39
4
0
11 Jun 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
42
9
0
11 Jun 2024
Needle In A Multimodal Haystack
Weiyun Wang
Shuibo Zhang
Yiming Ren
Yuchen Duan
Tiantong Li
...
Ping Luo
Yu Qiao
Jifeng Dai
Wenqi Shao
Wenhai Wang
VLM
57
16
0
11 Jun 2024
BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models
Wanaiu Huang
23
1
0
10 Jun 2024
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Romero
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
...
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
43
24
0
10 Jun 2024
M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark
Wei Song
Yadong Li
Jianhua Xu
Guowei Wu
Lingfeng Ming
...
Weihua Luo
Houyi Li
Yi Du
Fangda Guo
Kaicheng Yu
ELM
LRM
29
7
0
08 Jun 2024
An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models
Xiongtao Zhou
Jie He
Yuhua Ke
Guangyao Zhu
Víctor Gutiérrez-Basulto
Jeff Z. Pan
40
11
0
07 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
59
31
0
07 Jun 2024
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation
Jiaming Liu
Mengzhen Liu
Zhenyu Wang
Lily Lee
Kaichen Zhou
Pengju An
Senqiao Yang
Renrui Zhang
Yandong Guo
Shanghang Zhang
LM&Ro
LRM
Mamba
32
5
0
06 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
22
8
0
06 Jun 2024
Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
Wenliang Zhong
Wenyi Wu
Qi Li
Rob Barton
Boxin Du
Shioulin Sam
Karim Bouyarmane
Ismail B. Tutar
Junzhou Huang
25
3
0
05 Jun 2024
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Alex Jinpeng Wang
Linjie Li
Yiqi Lin
Min Li
Lijuan Wang
Mike Zheng Shou
VLM
18
3
0
04 Jun 2024
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
An-Chieh Cheng
Hongxu Yin
Yang Fu
Qiushan Guo
Ruihan Yang
Jan Kautz
Xiaolong Wang
Sifei Liu
LRM
48
43
0
03 Jun 2024
Selectively Answering Visual Questions
Julian Martin Eisenschlos
Hernán Maina
Guido Ivetta
Luciana Benotti
33
0
0
03 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
35
28
0
31 May 2024
Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization
Richard Luo
Austin Peng
Adithya Vasudev
Rishabh Jain
34
2
0
31 May 2024
Visual Perception by Large Language Model's Weights
Feipeng Ma
Hongwei Xue
Guangting Wang
Yizhou Zhou
Fengyun Rao
Shilin Yan
Yueyi Zhang
Siying Wu
Mike Zheng Shou
Xiaoyan Sun
VLM
25
5
0
30 May 2024
Instruction-Guided Visual Masking
Jinliang Zheng
Jianxiong Li
Si Cheng
Yinan Zheng
Jiaming Li
Jihao Liu
Yu Liu
Jingjing Liu
Xianyuan Zhan
45
5
0
30 May 2024
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
41
0
0
29 May 2024
X-VILA: Cross-Modality Alignment for Large Language Model
Hanrong Ye
De-An Huang
Yao Lu
Zhiding Yu
Wei Ping
...
Jan Kautz
Song Han
Dan Xu
Pavlo Molchanov
Hongxu Yin
MLLM
VLM
40
29
0
29 May 2024
Previous
1
2
3
...
9
10
11
...
38
39
40
Next