ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 792 papers shown
Title
On the Limitations of Dataset Balancing: The Lost Battle Against
  Spurious Correlations
On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations
Roy Schwartz
Gabriel Stanovsky
27
24
0
27 Apr 2022
Progressive Learning for Image Retrieval with Hybrid-Modality Queries
Progressive Learning for Image Retrieval with Hybrid-Modality Queries
Yida Zhao
Yuqing Song
Qin Jin
8
29
0
24 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
23
22
0
22 Apr 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
  Visual Models
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li
Haotian Liu
Liunian Harold Li
Pengchuan Zhang
J. Aneja
...
Ping Jin
Houdong Hu
Zicheng Liu
Yong Jae Lee
Jianfeng Gao
29
144
0
19 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
25
0
0
17 Apr 2022
Optimal quadratic binding for relational reasoning in vector symbolic
  neural architectures
Optimal quadratic binding for relational reasoning in vector symbolic neural architectures
Naoki Hiratani
H. Sompolinsky
17
5
0
14 Apr 2022
On the Importance of Karaka Framework in Multi-modal Grounding
On the Importance of Karaka Framework in Multi-modal Grounding
Sai Kiran Gorthi
R. Mamidi
16
1
0
09 Apr 2022
Parameter-Efficient Abstractive Question Answering over Tables or Text
Parameter-Efficient Abstractive Question Answering over Tables or Text
Vaishali Pal
Evangelos Kanoulas
Maarten de Rijke
LMTD
19
14
0
07 Apr 2022
An Algebraic Approach to Learning and Grounding
An Algebraic Approach to Learning and Grounding
Johanna Björklund
Adam Dahlgren Lindström
F. Drewes
17
0
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video
  Segmentation
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRM
NAI
27
20
0
05 Apr 2022
Question-Driven Graph Fusion Network For Visual Question Answering
Question-Driven Graph Fusion Network For Visual Question Answering
Yuxi Qian
Yuncong Hu
Ruonan Wang
Fangxiang Feng
Xiaojie Wang
GNN
16
10
0
03 Apr 2022
Co-VQA : Answering by Interactive Sub Question Sequence
Co-VQA : Answering by Interactive Sub Question Sequence
Ruonan Wang
Yuxi Qian
Fangxiang Feng
Xiaojie Wang
Huixing Jiang
LRM
21
16
0
02 Apr 2022
To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo
To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo
Yiran Luo
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
16
4
0
30 Mar 2022
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Manuel Kolmet
Qunjie Zhou
Aljosa Osep
Laura Leal-Taixe
19
22
0
28 Mar 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
25
148
0
28 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
20
16
0
27 Mar 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
29
136
0
26 Mar 2022
WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
Shan Yuan
Shuai Zhao
Jiahong Leng
Zhao Xue
Hanyu Zhao
Peiyu Liu
Zheng Gong
Wayne Xin Zhao
Junyi Li
Tang Jie
VLM
29
5
0
22 Mar 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for
  Knowledge-based Visual Question Answering
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding
Jing Yu
Bangchang Liu
Yue Hu
Mingxin Cui
Qi Wu
11
62
0
17 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
35
50
0
16 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
20
18
0
11 Mar 2022
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of
  Pretrained Models to Classification Tasks
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks
Nan Ding
Xi Chen
Tomer Levinboim
Soravit Changpinyo
Radu Soricut
22
26
0
10 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and
  Vision-Language Tasks
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILM
ELM
LRM
16
67
0
09 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
27
27
0
08 Mar 2022
MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
Shiming Chen
Ziming Hong
Guosen Xie
Wenhan Wang
Qinmu Peng
Kai Wang
Jian-jun Zhao
Xinge You
VLM
18
99
0
07 Mar 2022
Quantity over Quality: Training an AV Motion Planner with Large Scale
  Commodity Vision Data
Quantity over Quality: Training an AV Motion Planner with Large Scale Commodity Vision Data
Lukas Platinsky
Tayyab Naseer
Hui Chen
Benjamin A. Haines
Haoyue Zhu
Hugo Grimmett
Luca Del Pero
20
1
0
03 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
16
85
0
02 Mar 2022
Recent, rapid advancement in visual question answering architecture: a
  review
Recent, rapid advancement in visual question answering architecture: a review
V. Kodali
Daniel Berleant
29
9
0
02 Mar 2022
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Spyridon Mouselinos
Henryk Michalewski
Mateusz Malinowski
13
3
0
24 Feb 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
ViT
VLM
189
499
0
22 Feb 2022
A Survey of Vision-Language Pre-Trained Models
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
28
179
0
18 Feb 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
82
212
0
18 Feb 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
28
154
0
11 Feb 2022
Can Open Domain Question Answering Systems Answer Visual Knowledge
  Questions?
Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?
Jiawen Zhang
Abhijit Mishra
Avinesh P.V.S
Siddharth Patwardhan
Sachin Agarwal
24
0
0
09 Feb 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
22
7
0
08 Feb 2022
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations
  (CGL)
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)
Binoy Saha
Sukhendu Das
14
1
0
05 Feb 2022
Webly Supervised Concept Expansion for General Purpose Vision Models
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath
Christopher Clark
Tanmay Gupta
Eric Kolve
Derek Hoiem
Aniruddha Kembhavi
VLM
27
54
0
04 Feb 2022
Grounding Answers for Visual Questions Asked by Visually Impaired People
Grounding Answers for Visual Questions Asked by Visually Impaired People
Chongyan Chen
Samreen Anjum
Danna Gurari
23
50
0
04 Feb 2022
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's
  Progressive Matrices
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices
Mikolaj Malkiñski
Jacek Mañdziuk
120
41
0
28 Jan 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
390
4,125
0
28 Jan 2022
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis
Georgios Paraskevopoulos
Efthymios Georgiou
Alexandros Potamianos
11
26
0
24 Jan 2022
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal
  Grounding
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun Reddy Akula
OOD
21
3
0
24 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
Language-driven Semantic Segmentation
Language-driven Semantic Segmentation
Boyi Li
Kilian Q. Weinberger
Serge J. Belongie
V. Koltun
René Ranftl
VLM
43
600
0
10 Jan 2022
OpenQA: Hybrid QA System Relying on Structured Knowledge Base as well as
  Non-structured Data
OpenQA: Hybrid QA System Relying on Structured Knowledge Base as well as Non-structured Data
Gaochen Wu
Bin Xu
Yuxin Qin
Yang Liu
Lingyu Liu
Ziwei Wang
13
0
0
31 Dec 2021
Understanding and Measuring Robustness of Multimodal Learning
Understanding and Measuring Robustness of Multimodal Learning
Nishant Vishwamitra
Hongxin Hu
Ziming Zhao
Long Cheng
Feng Luo
AAML
11
5
0
22 Dec 2021
A Survey of Natural Language Generation
A Survey of Natural Language Generation
Chenhe Dong
Yinghui Li
Haifan Gong
M. Chen
Junxin Li
Ying Shen
Min Yang
3DV
21
43
0
22 Dec 2021
Domain Adaptation with Pre-trained Transformers for Query Focused
  Abstractive Text Summarization
Domain Adaptation with Pre-trained Transformers for Query Focused Abstractive Text Summarization
Md Tahmid Rahman Laskar
Enamul Hoque
J. Huang
28
44
0
22 Dec 2021
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media
  Knowledge Extraction and Grounding
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Revanth Reddy Gangi Reddy
Xilin Rui
Manling Li
Xudong Lin
Haoyang Wen
...
Mohit Bansal
Avirup Sil
Shih-Fu Chang
A. Schwing
Heng Ji
17
31
0
20 Dec 2021
Previous
123...789...141516
Next