Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1612.00837
Cited By
v1
v2
v3 (latest)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"
50 / 2,278 papers shown
LOIS: Looking Out of Instance Semantics for Visual Question Answering
IEEE transactions on multimedia (IEEE TMM), 2023
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
163
8
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
434
151
0
25 Jul 2023
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework
Jingxuan Wei
Cheng Tan
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Bihui Yu
R. Guo
Stan Z. Li
LRM
379
17
0
24 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
333
45
0
21 Jul 2023
Conformal prediction under ambiguous ground truth
David Stutz
Abhijit Guha Roy
Tatiana Matejovicova
Patricia Strachan
A. Cemgil
Arnaud Doucet
563
25
0
18 Jul 2023
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
IEEE International Conference on Computer Vision (ICCV), 2023
Chaoya Jiang
Haiyang Xu
Wei Ye
Qinghao Ye
Chenliang Li
Mingshi Yan
Bin Bi
Shikun Zhang
Fei Huang
Songfang Huang
VLM
196
9
0
17 Jul 2023
PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese
International Conference on Multimedia Analysis and Pattern Recognition (ICMAPR), 2023
Nghia Hieu Nguyen
Kiet Van Nguyen
208
2
0
17 Jul 2023
Planting a SEED of Vision in Large Language Model
Yuying Ge
Yixiao Ge
Ziyun Zeng
Xintao Wang
Ying Shan
VLM
MLLM
255
126
0
16 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
IEEE International Conference on Computer Vision (ICCV), 2023
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLM
LRM
261
7
0
15 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Neural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
389
44
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLM
MLLM
228
42
0
13 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV), 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
713
1,664
0
12 Jul 2023
Emu: Generative Pretraining in Multimodality
International Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
359
155
0
11 Jul 2023
Enhancing Cross-lingual Transfer via Phonemic Transcription Integration
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Hoang Nguyen
Chenwei Zhang
Tao Zhang
Eugene Rohrbaugh
Philip S. Yu
220
10
0
10 Jul 2023
SVIT: Scaling up Visual Instruction Tuning
Bo Zhao
Boya Wu
Muyang He
Tiejun Huang
MLLM
314
157
0
09 Jul 2023
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Netta Madvil
Yonatan Bitton
Roy Schwartz
216
3
0
06 Jul 2023
Several categories of Large Language Models (LLMs): A Short Survey
International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2023
Saurabh Pahune
Manoj Chandrasekharan
AILaw
202
31
0
05 Jul 2023
Localized Questions in Medical Visual Question Answering
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
172
9
0
03 Jul 2023
Visual Instruction Tuning with Polite Flamingo
AAAI Conference on Artificial Intelligence (AAAI), 2023
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
394
52
0
03 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
Neural Information Processing Systems (NeurIPS), 2023
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Jiaming Song
341
169
0
03 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
317
4
0
03 Jul 2023
S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture
Ye Xue
Diego Klabjan
J. Utke
94
0
0
01 Jul 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
International Joint Conference on Artificial Intelligence (IJCAI), 2023
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
167
7
0
29 Jun 2023
Deep Equilibrium Multimodal Fusion
Jinhong Ni
Yalong Bai
Wei Zhang
Ting Yao
Tao Mei
273
8
0
29 Jun 2023
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
MLLM
VLM
186
65
0
28 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Weihao Ye
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLM
VPVLM
127
2
0
27 Jun 2023
FunQA: Towards Surprising Video Comprehension
Binzhu Xie
Sicheng Zhang
Zitang Zhou
Yue Liu
Yuanhan Zhang
Jack Hessel
Jingkang Yang
Ziwei Liu
417
35
0
26 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
European Conference on Computer Vision (ECCV), 2023
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
103
6
0
25 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Chunjiang Ge
Yulei Qin
Mengdan Zhang
...
Xing Sun
Zhenyu Qiu
Rongrong Ji
Caifeng Shan
Ran He
ELM
MLLM
807
1,224
0
23 Jun 2023
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter
Binjie Zhang
Yixiao Ge
Xuyuan Xu
Ying Shan
Mike Zheng Shou
172
9
0
22 Jun 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
Neural Information Processing Systems (NeurIPS), 2023
Elizaveta Semenova
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
377
57
0
21 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
370
18
0
16 Jun 2023
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
IEEE International Conference on Computer Vision (ICCV), 2023
Thomas Mensink
J. Uijlings
Lluis Castrejon
A. Goel
Felipe Cadar
Howard Zhou
Fei Sha
A. Araújo
V. Ferrari
281
82
0
15 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLM
AuLLM
373
209
0
15 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
International Conference on Learning Representations (ICLR), 2023
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLM
CLIP
203
11
0
15 Jun 2023
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
International Conference on Machine Learning (ICML), 2023
Isha Rawal
Alexander Matyasko
Shantanu Jaiswal
Basura Fernando
Cheston Tan
276
7
0
15 Jun 2023
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Le Zhang
Rabiul Awal
Aishwarya Agrawal
CoGe
VLM
410
27
0
15 Jun 2023
Improving Selective Visual Question Answering by Learning from Your Peers
Computer Vision and Pattern Recognition (CVPR), 2023
Corentin Dancette
Spencer Whitehead
Rishabh Maheshwary
Ramakrishna Vedantam
Stefan Scherer
Xinlei Chen
Matthieu Cord
Marcus Rohrbach
AAML
OOD
222
24
0
14 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
254
9
0
14 Jun 2023
AVIS: Autonomous Visual Information Seeking with Large Language Model Agent
Neural Information Processing Systems (NeurIPS), 2023
Ziniu Hu
Ahmet Iscen
Chen Sun
Kai-Wei Chang
Luke Huan
David A. Ross
Cordelia Schmid
Alireza Fathi
298
12
0
13 Jun 2023
Image Captioners Are Scalable Vision Learners Too
Neural Information Processing Systems (NeurIPS), 2023
Michael Tschannen
Manoj Kumar
Andreas Steiner
Xiaohua Zhai
N. Houlsby
Lucas Beyer
VLM
CLIP
840
82
0
13 Jun 2023
Zero-shot Composed Text-Image Retrieval
British Machine Vision Conference (BMVC), 2023
Yikun Liu
Jiangchao Yao
Ya Zhang
Yanfeng Wang
Weidi Xie
211
33
0
12 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
International Conference on Learning Representations (ICLR), 2023
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
292
39
0
12 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language Pre-training
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
252
8
0
12 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Chen Ma
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
112
14
0
12 Jun 2023
Weakly Supervised Visual Question Answer Generation
Charani Alampalle
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
191
0
0
11 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
IEEE Access (IEEE Access), 2023
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
335
40
0
09 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Yue Liu
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
Xuefei Liu
Ziwei Liu
MLLM
VLM
279
291
0
08 Jun 2023
Modular Visual Question Answering via Code Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
228
62
0
08 Jun 2023
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Neural Information Processing Systems (NeurIPS), 2023
Wenxuan Zhang
Sharifah Mahani Aljunied
Chang Gao
Yew Ken Chia
Lidong Bing
ELM
312
128
0
08 Jun 2023
Previous
1
2
3
...
25
26
27
...
44
45
46
Next
Page 26 of 46
Page
of 46
Go