ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 727 papers shown
Title
An Examination of the Robustness of Reference-Free Image Captioning
  Evaluation Metrics
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
Aishwarya Agrawal
17
6
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
39
43
0
24 May 2023
GEST: the Graph of Events in Space and Time as a Common Representation
  between Vision and Language
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
14
0
0
22 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language,
  and Speech Data
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
17
2
0
21 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
J. Liu
13
1
0
19 May 2023
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual
  Grounding
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding
Chenchi Zhang
Jun Xiao
Lei Chen
Jian Shao
Long Chen
VLM
LRM
22
2
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
20
114
0
18 May 2023
Visual Question Answering: A Survey on Techniques and Common Trends in
  Recent Literature
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
30
22
0
18 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
43
73
0
17 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
60
691
0
17 May 2023
Combo of Thinking and Observing for Outside-Knowledge VQA
Combo of Thinking and Observing for Outside-Knowledge VQA
Q. Si
Yuchen Mo
Zheng Lin
Huishan Ji
Weiping Wang
38
13
0
10 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
  Large Language Models
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
24
40
0
09 May 2023
Vision Language Pre-training by Contrastive Learning with Cross-Modal
  Similarity Regulation
Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
Chaoya Jiang
Wei Ye
Haiyang Xu
Miang yan
Shikun Zhang
Jie Zhang
Fei Huang
VLM
19
15
0
08 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
27
85
0
02 May 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via
  Word-Region Alignment
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
37
73
0
10 Apr 2023
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Noa Garcia
Yusuke Hirota
Yankun Wu
Yuta Nakashima
EGVM
31
51
0
06 Apr 2023
I2I: Initializing Adapters with Improvised Knowledge
I2I: Initializing Adapters with Improvised Knowledge
Tejas Srinivasan
Furong Jia
Mohammad Rostami
Jesse Thomason
CLL
24
6
0
04 Apr 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
16
6
0
04 Apr 2023
Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior
  Understanding
Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior Understanding
Xiang Zhang
Taoyue Wang
Xiaotian Li
Huiyuan Yang
L. Yin
32
9
0
31 Mar 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
18
20
0
31 Mar 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
23
23
0
29 Mar 2023
Borrowing Human Senses: Comment-Aware Self-Training for Social Media
  Multimodal Classification
Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
Chunpu Xu
Jing Li
VLM
13
5
0
27 Mar 2023
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph
  Generation with Decoupled Label Learning
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning
Wenqing Wang
Yawei Luo
Zhiqin Chen
Tao Jiang
Lei Chen
Yi Yang
Jun Xiao
30
7
0
23 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
17
28
0
23 Mar 2023
Integrating Image Features with Convolutional Sequence-to-sequence
  Network for Multilingual Visual Question Answering
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
T. M. Thai
Son T. Luu
32
0
0
22 Mar 2023
Location-Free Scene Graph Generation
Location-Free Scene Graph Generation
Ege Ozsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
37
4
0
20 Mar 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
35
23
0
16 Mar 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image
  Segmentation
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip H. S. Torr
31
23
0
11 Mar 2023
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation
  Using Scene Object Spectrum Grounding
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
Minyoung Hwang
Jaeyeon Jeong
Minsoo Kim
Yoonseon Oh
Songhwai Oh
17
19
0
07 Mar 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
15
1
0
05 Mar 2023
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong
  Few-shot Learners
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Renrui Zhang
Xiangfei Hu
Bohao Li
Siyuan Huang
Hanqiu Deng
Hongsheng Li
Yu Qiao
Peng Gao
VLM
MLLM
30
170
0
03 Mar 2023
Interpretable Medical Image Visual Question Answering via Multi-Modal
  Relationship Graph Learning
Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Xinyue Hu
Lin Gu
Kazuma Kobayashi
Qi A. An
Qingyu Chen
Zhiyong Lu
Chang Su
Tatsuya Harada
Yingying Zhu
GNN
21
9
0
19 Feb 2023
On The Coherence of Quantitative Evaluation of Visual Explanations
On The Coherence of Quantitative Evaluation of Visual Explanations
Benjamin Vandersmissen
José Oramas
XAI
FAtt
21
3
0
14 Feb 2023
Context Understanding in Computer Vision: A Survey
Context Understanding in Computer Vision: A Survey
Xuan Wang
Zhigang Zhu
11
45
0
10 Feb 2023
Structured Generative Models for Scene Understanding
Structured Generative Models for Scene Understanding
Christopher K. I. Williams
OCL
3DV
19
3
0
07 Feb 2023
Vertical Federated Learning: Taxonomies, Threats, and Prospects
Vertical Federated Learning: Taxonomies, Threats, and Prospects
Qun Li
Chandra Thapa
Lawrence Ong
Yifeng Zheng
Hua Ma
S. Çamtepe
Anmin Fu
Yan Gao
FedML
30
10
0
03 Feb 2023
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Chen Chen
Bowen Zhang
Liangliang Cao
Jiguang Shen
Tom Gunter
Albin Madappally Jose
Alexander Toshev
Jonathon Shlens
Ruoming Pang
Yinfei Yang
VLM
3DV
25
13
0
30 Jan 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution
  Generalization of VQA Models
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
10
1
0
28 Jan 2023
Towards a Unified Model for Generating Answers and Explanations in
  Visual Question Answering
Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering
Chenxi Whitehouse
Tillman Weyde
Pranava Madhyastha
LRM
36
3
0
25 Jan 2023
Tensor Networks Meet Neural Networks: A Survey and Future Perspectives
Tensor Networks Meet Neural Networks: A Survey and Future Perspectives
Maolin Wang
Y. Pan
Zenglin Xu
Xiangli Yang
Guangxi Li
A. Cichocki
Andrzej Cichocki
43
19
0
22 Jan 2023
Joint Representation Learning for Text and 3D Point Cloud
Joint Representation Learning for Text and 3D Point Cloud
Rui Huang
Xuran Pan
Henry Zheng
Haojun Jiang
Zhifeng Xie
S. Song
Gao Huang
13
19
0
18 Jan 2023
Curriculum Script Distillation for Multilingual Visual Question
  Answering
Curriculum Script Distillation for Multilingual Visual Question Answering
Khyathi Raghavi Chandu
A. Geramifard
19
0
0
17 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
24
45
0
16 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language
  Models for Knowledge-based Visual Reasoning
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Yikang Shen
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
31
35
0
12 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question
  Answering
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner
O. Ferret
C. Guinaudeau
16
9
0
11 Jan 2023
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
11
4
0
05 Jan 2023
Is GPT-3 a Good Data Annotator?
Is GPT-3 a Good Data Annotator?
Bosheng Ding
Chengwei Qin
Linlin Liu
Yew Ken Chia
Shafiq R. Joty
Boyang Albert Li
Lidong Bing
24
231
0
20 Dec 2022
Position-guided Text Prompt for Vision-Language Pre-training
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
19
37
0
19 Dec 2022
SceneGATE: Scene-Graph based co-Attention networks for TExt visual
  question answering
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
18
4
0
16 Dec 2022
Attributed Question Answering: Evaluation and Modeling for Attributed
  Large Language Models
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Bernd Bohnet
Vinh Q. Tran
Pat Verga
Roee Aharoni
D. Andor
...
Michael Collins
Dipanjan Das
Donald Metzler
Slav Petrov
Kellie Webster
41
59
0
15 Dec 2022
Previous
123456...131415
Next