ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 699 papers shown
Title
Visual Question Generation in Bengali
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
19
1
0
12 Oct 2023
Semantic Scene Difference Detection in Daily Life Patroling by Mobile
  Robots using Pre-Trained Large-Scale Vision-Language Model
Semantic Scene Difference Detection in Daily Life Patroling by Mobile Robots using Pre-Trained Large-Scale Vision-Language Model
Yoshiki Obinata
Kento Kawaharazuka
Naoaki Kanazawa
N. Yamaguchi
Naoto Tsukamoto
Iori Yanokura
Shingo Kitagawa
Koki Shinjo
K. Okada
Masayuki Inaba
LM&Ro
15
5
0
28 Sep 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
29
92
0
27 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
61
222
0
26 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
41
3
0
13 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex
  Visually-Grounded Referencing using Determiners
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
Clarence Lee
M Ganesh Kumar
Cheston Tan
28
3
0
07 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
21
2
0
06 Sep 2023
CIEM: Contrastive Instruction Evaluation Method for Better Instruction
  Tuning
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Hongyu Hu
Jiyuan Zhang
Minyi Zhao
Zhenbang Sun
MLLM
25
41
0
05 Sep 2023
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes
  in Product Images for e-commerce Vision-Language Applications
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications
Wenyi Wu
Karim Bouyarmane
Ismail B. Tutar
23
2
0
30 Aug 2023
On the Potential of CLIP for Compositional Logical Reasoning
On the Potential of CLIP for Compositional Logical Reasoning
Justin D. Brody
LRM
13
4
0
30 Aug 2023
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised
  Fine-Grained Alignment
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment
Jiamin Zhuang
Jing Yu
Yang Ding
Xiangyang Qu
Yue Hu
19
9
0
27 Aug 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive
  Language-Image Pre-training
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
29
3
0
22 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Hong Li
Xingyu Li
Pengbo Hu
Yinuo Lei
Chunxiao Li
Yi Zhou
28
20
0
15 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
29
77
0
12 Aug 2023
Environment-Invariant Curriculum Relation Learning for Fine-Grained
  Scene Graph Generation
Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation
Yu Min
Aming Wu
Cheng Deng
17
6
0
07 Aug 2023
Making the V in Text-VQA Matter
Making the V in Text-VQA Matter
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
CoGe
29
4
0
01 Aug 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for
  Complex Visual Reasoning Tasks
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Kousik Rajesh
Mrigank Raman
M. A. Karim
Pranit Chawla
VLM
23
2
0
31 Jul 2023
Panoptic Scene Graph Generation with Semantics-Prototype Learning
Panoptic Scene Graph Generation with Semantics-Prototype Learning
Li Li
Wei Ji
Yiming Wu
Meng Li
Youxuan Qin
Lina Wei
Roger Zimmermann
24
35
0
28 Jul 2023
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers
  Models for Vietnamese Visual Question Answering
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering
Khiem Vinh Tran
Kiet Van Nguyen
N. Nguyen
ViT
23
2
0
28 Jul 2023
MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained
  Semantic Classes and Hard Negative Entities
MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities
Y. Li
Tingwei Lu
Yinghui Li
Tianyu Yu
Shulin Huang
Haitao Zheng
Rui Zhang
Jun Yuan
37
11
0
27 Jul 2023
LOIS: Looking Out of Instance Semantics for Visual Question Answering
LOIS: Looking Out of Instance Semantics for Visual Question Answering
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
23
4
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
22
117
0
25 Jul 2023
Localized Questions in Medical Visual Question Answering
Localized Questions in Medical Visual Question Answering
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
17
8
0
03 Jul 2023
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Hikaru Shindo
Viktor Pfanschilling
D. Dhami
Kristian Kersting
NAI
19
6
0
03 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
31
101
0
03 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
28
3
0
03 Jul 2023
HeGeL: A Novel Dataset for Geo-Location from Hebrew Text
HeGeL: A Novel Dataset for Geo-Location from Hebrew Text
Tzuf Paz-Argaman
Tal Bauman
Itai Mondshine
Itzhak Omer
S. Dalyot
Reut Tsarfaty
14
3
0
02 Jul 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun
  resolution
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
11
39
0
21 Jun 2023
GenPlot: Increasing the Scale and Diversity of Chart Derendering Data
GenPlot: Increasing the Scale and Diversity of Chart Derendering Data
Brendan Artley
23
1
0
20 Jun 2023
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
  Generation
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation
Shuo Chen
Yingjun Du
Pascal Mettes
Cees G. M. Snoek
OffRL
26
2
0
16 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
38
12
0
16 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
29
7
0
14 Jun 2023
Visual Question Answering (VQA) on Images with Superimposed Text
Visual Question Answering (VQA) on Images with Superimposed Text
V. Kodali
Daniel Berleant
13
1
0
13 Jun 2023
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
Yukun Zhai
Xiaoqiang Zhang
Xiameng Qin
Sanyuan Zhao
Xingping Dong
Jianbing Shen
33
4
0
06 Jun 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
21
0
0
31 May 2023
Learning without Forgetting for Vision-Language Models
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
69
37
0
30 May 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research
  in Hausa Language
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
22
4
0
28 May 2023
Are Diffusion Models Vision-And-Language Reasoners?
Are Diffusion Models Vision-And-Language Reasoners?
Benno Krojer
Elinor Poole-Dayan
Vikram S. Voleti
Christopher Pal
Siva Reddy
34
12
0
25 May 2023
An Examination of the Robustness of Reference-Free Image Captioning
  Evaluation Metrics
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
Aishwarya Agrawal
17
6
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
39
43
0
24 May 2023
GEST: the Graph of Events in Space and Time as a Common Representation
  between Vision and Language
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
14
0
0
22 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language,
  and Speech Data
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
17
2
0
21 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
J. Liu
13
1
0
19 May 2023
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual
  Grounding
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding
Chenchi Zhang
Jun Xiao
Lei Chen
Jian Shao
Long Chen
VLM
LRM
22
2
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
114
0
18 May 2023
Visual Question Answering: A Survey on Techniques and Common Trends in
  Recent Literature
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
30
22
0
18 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
33
73
0
17 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
52
691
0
17 May 2023
Combo of Thinking and Observing for Outside-Knowledge VQA
Combo of Thinking and Observing for Outside-Knowledge VQA
Q. Si
Yuchen Mo
Zheng Lin
Huishan Ji
Weiping Wang
30
13
0
10 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
  Large Language Models
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
24
40
0
09 May 2023
Previous
12345...121314
Next