ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
v1v2v3 (latest)

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 2,278 papers shown
LOIS: Looking Out of Instance Semantics for Visual Question Answering
LOIS: Looking Out of Instance Semantics for Visual Question AnsweringIEEE transactions on multimedia (IEEE TMM), 2023
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
163
8
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
434
151
0
25 Jul 2023
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
  and Comprehensive Framework
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework
Jingxuan Wei
Cheng Tan
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Bihui Yu
R. Guo
Stan Z. Li
LRM
379
17
0
24 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
333
45
0
21 Jul 2023
Conformal prediction under ambiguous ground truth
Conformal prediction under ambiguous ground truth
David Stutz
Abhijit Guha Roy
Tatiana Matejovicova
Patricia Strachan
A. Cemgil
Arnaud Doucet
563
25
0
18 Jul 2023
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up
  Patch Summarization
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch SummarizationIEEE International Conference on Computer Vision (ICCV), 2023
Chaoya Jiang
Haiyang Xu
Wei Ye
Qinghao Ye
Chenliang Li
Mingshi Yan
Bin Bi
Shikun Zhang
Fei Huang
Songfang Huang
VLM
196
9
0
17 Jul 2023
PAT: Parallel Attention Transformer for Visual Question Answering in
  Vietnamese
PAT: Parallel Attention Transformer for Visual Question Answering in VietnameseInternational Conference on Multimedia Analysis and Pattern Recognition (ICMAPR), 2023
Nghia Hieu Nguyen
Kiet Van Nguyen
208
2
0
17 Jul 2023
Planting a SEED of Vision in Large Language Model
Planting a SEED of Vision in Large Language Model
Yuying Ge
Yixiao Ge
Ziyun Zeng
Xintao Wang
Ying Shan
VLMMLLM
255
126
0
16 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
SINC: Self-Supervised In-Context Learning for Vision-Language TasksIEEE International Conference on Computer Vision (ICCV), 2023
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLMLRM
261
7
0
15 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language
  Pre-training
Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingNeural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLMMLLM
389
44
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLMMLLM
228
42
0
13 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
MMBench: Is Your Multi-modal Model an All-around Player?European Conference on Computer Vision (ECCV), 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
713
1,664
0
12 Jul 2023
Emu: Generative Pretraining in Multimodality
Emu: Generative Pretraining in MultimodalityInternational Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
359
155
0
11 Jul 2023
Enhancing Cross-lingual Transfer via Phonemic Transcription Integration
Enhancing Cross-lingual Transfer via Phonemic Transcription IntegrationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Hoang Nguyen
Chenwei Zhang
Tao Zhang
Eugene Rohrbaugh
Philip S. Yu
220
10
0
10 Jul 2023
SVIT: Scaling up Visual Instruction Tuning
SVIT: Scaling up Visual Instruction Tuning
Bo Zhao
Boya Wu
Muyang He
Tiejun Huang
MLLM
314
157
0
09 Jul 2023
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Netta Madvil
Yonatan Bitton
Roy Schwartz
216
3
0
06 Jul 2023
Several categories of Large Language Models (LLMs): A Short Survey
Several categories of Large Language Models (LLMs): A Short SurveyInternational Journal for Research in Applied Science and Engineering Technology (IJRASET), 2023
Saurabh Pahune
Manoj Chandrasekharan
AILaw
202
31
0
05 Jul 2023
Localized Questions in Medical Visual Question Answering
Localized Questions in Medical Visual Question AnsweringInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
172
9
0
03 Jul 2023
Visual Instruction Tuning with Polite Flamingo
Visual Instruction Tuning with Polite FlamingoAAAI Conference on Artificial Intelligence (AAAI), 2023
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
394
52
0
03 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
JourneyDB: A Benchmark for Generative Image UnderstandingNeural Information Processing Systems (NeurIPS), 2023
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Jiaming Song
341
169
0
03 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
317
4
0
03 Jul 2023
S-Omninet: Structured Data Enhanced Universal Multimodal Learning
  Architecture
S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture
Ye Xue
Diego Klabjan
J. Utke
94
0
0
01 Jul 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual
  Question Answering
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question AnsweringInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
167
7
0
29 Jun 2023
Deep Equilibrium Multimodal Fusion
Deep Equilibrium Multimodal Fusion
Jinhong Ni
Yalong Bai
Wei Zhang
Ting Yao
Tao Mei
273
8
0
29 Jun 2023
Towards Language Models That Can See: Computer Vision Through the LENS
  of Natural Language
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
MLLMVLM
186
65
0
28 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Weihao Ye
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLMVPVLM
127
2
0
27 Jun 2023
FunQA: Towards Surprising Video Comprehension
FunQA: Towards Surprising Video Comprehension
Binzhu Xie
Sicheng Zhang
Zitang Zhou
Yue Liu
Yuanhan Zhang
Jack Hessel
Jingkang Yang
Ziwei Liu
417
35
0
26 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching
  Attention and Input
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and InputEuropean Conference on Computer Vision (ECCV), 2023
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
103
6
0
25 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Chunjiang Ge
Yulei Qin
Mengdan Zhang
...
Xing Sun
Zhenyu Qiu
Rongrong Ji
Caifeng Shan
Ran He
ELMMLLM
807
1,224
0
23 Jun 2023
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic
  Compatible Adapter
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter
Binjie Zhang
Yixiao Ge
Xuyuan Xu
Ying Shan
Mike Zheng Shou
172
9
0
22 Jun 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun
  resolution
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolutionNeural Information Processing Systems (NeurIPS), 2023
Elizaveta Semenova
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
377
57
0
21 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
370
18
0
16 Jun 2023
Encyclopedic VQA: Visual questions about detailed properties of
  fine-grained categories
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categoriesIEEE International Conference on Computer Vision (ICCV), 2023
Thomas Mensink
J. Uijlings
Lluis Castrejon
A. Goel
Felipe Cadar
Howard Zhou
Fei Sha
A. Araújo
V. Ferrari
281
82
0
15 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and
  Text Integration
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLMAuLLM
373
209
0
15 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
COSA: Concatenated Sample Pretrained Vision-Language Foundation ModelInternational Conference on Learning Representations (ICLR), 2023
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLMCLIP
203
11
0
15 Jun 2023
Dissecting Multimodality in VideoQA Transformer Models by Impairing
  Modality Fusion
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality FusionInternational Conference on Machine Learning (ICML), 2023
Isha Rawal
Alexander Matyasko
Shantanu Jaiswal
Basura Fernando
Cheston Tan
276
7
0
15 Jun 2023
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to
  Enhance Visio-Linguistic Compositional Understanding
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingComputer Vision and Pattern Recognition (CVPR), 2023
Le Zhang
Rabiul Awal
Aishwarya Agrawal
CoGeVLM
410
27
0
15 Jun 2023
Improving Selective Visual Question Answering by Learning from Your
  Peers
Improving Selective Visual Question Answering by Learning from Your PeersComputer Vision and Pattern Recognition (CVPR), 2023
Corentin Dancette
Spencer Whitehead
Rishabh Maheshwary
Ramakrishna Vedantam
Stefan Scherer
Xinlei Chen
Matthieu Cord
Marcus Rohrbach
AAMLOOD
222
24
0
14 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
254
9
0
14 Jun 2023
AVIS: Autonomous Visual Information Seeking with Large Language Model
  Agent
AVIS: Autonomous Visual Information Seeking with Large Language Model AgentNeural Information Processing Systems (NeurIPS), 2023
Ziniu Hu
Ahmet Iscen
Chen Sun
Kai-Wei Chang
Luke Huan
David A. Ross
Cordelia Schmid
Alireza Fathi
298
12
0
13 Jun 2023
Image Captioners Are Scalable Vision Learners Too
Image Captioners Are Scalable Vision Learners TooNeural Information Processing Systems (NeurIPS), 2023
Michael Tschannen
Manoj Kumar
Andreas Steiner
Xiaohua Zhai
N. Houlsby
Lucas Beyer
VLMCLIP
840
82
0
13 Jun 2023
Zero-shot Composed Text-Image Retrieval
Zero-shot Composed Text-Image RetrievalBritish Machine Vision Conference (BMVC), 2023
Yikun Liu
Jiangchao Yao
Ya Zhang
Yanfeng Wang
Weidi Xie
211
33
0
12 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Retrieval-Enhanced Contrastive Vision-Text ModelsInternational Conference on Learning Representations (ICLR), 2023
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIPVLM
292
39
0
12 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language
  Pre-training
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
252
8
0
12 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Chen Ma
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
112
14
0
12 Jun 2023
Weakly Supervised Visual Question Answer Generation
Weakly Supervised Visual Question Answer Generation
Charani Alampalle
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
191
0
0
11 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review
  of Methodological Advances and Future Research Directions
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research DirectionsIEEE Access (IEEE Access), 2023
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
335
40
0
09 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Yue Liu
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
Xuefei Liu
Ziwei Liu
MLLMVLM
279
291
0
08 Jun 2023
Modular Visual Question Answering via Code Generation
Modular Visual Question Answering via Code GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
228
62
0
08 Jun 2023
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining
  Large Language Models
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Wenxuan Zhang
Sharifah Mahani Aljunied
Chang Gao
Yew Ken Chia
Lidong Bing
ELM
312
128
0
08 Jun 2023
Previous
123...252627...444546
Next
Page 26 of 46
Pageof 46