ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.08029
  4. Cited By
Image Captioning for Effective Use of Language Models in Knowledge-Based
  Visual Question Answering
v1v2v3 (latest)

Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering

15 September 2021
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
ArXiv (abs)PDFHTML

Papers citing "Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering"

20 / 20 papers shown
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
Sen Nie
Jie M. Zhang
Jianxin Yan
Shiguang Shan
Xilin Chen
AAML
357
2
0
25 Nov 2025
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Mohammad Saleha
Azadeh Tabatabaeib
700
7
0
14 Apr 2025
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
Zhaoyi Li
Xiaohan Zhao
Dong-Dong Wu
Jiacheng Cui
Zhiqiang Shen
AAMLVLM
634
22
0
13 Mar 2025
An Enhanced Large Language Model For Cross Modal Query Understanding System Using DL-KeyBERT Based CAZSSCL-MPGPT
An Enhanced Large Language Model For Cross Modal Query Understanding System Using DL-KeyBERT Based CAZSSCL-MPGPT
Shreya Singh
348
0
0
24 Feb 2025
MageBench: Bridging Large Multimodal Models to Agents
MageBench: Bridging Large Multimodal Models to Agents
Miaosen Zhang
Jingdong Sun
Yifan Yang
Jianmin Bao
Dongdong Chen
Kai Qiu
Chong Luo
Xin Geng
B. Guo
LRMLLMAG
237
4
0
05 Dec 2024
IIU: Independent Inference Units for Knowledge-based Visual Question
  Answering
IIU: Independent Inference Units for Knowledge-based Visual Question AnsweringKnowledge Science, Engineering and Management (KSEM), 2024
Yili Li
Jing Yu
Keke Gai
Gang Xiong
223
2
0
15 Aug 2024
GP-VLS: A general-purpose vision language model for surgery
GP-VLS: A general-purpose vision language model for surgery
Samuel Schmidgall
Joseph Cho
C. Zakka
W. Hiesinger
LM&MA
400
20
0
27 Jul 2024
A Survey on Generative AI and LLM for Video Generation, Understanding,
  and Streaming
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
Pengyuan Zhou
Lin Wang
Zhi Liu
Yanbin Hao
Pan Hui
Sasu Tarkoma
J. Kangasharju
VGen
297
53
0
30 Jan 2024
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle
  Training Approach for Vision-Language Models
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models
Xiaoyu Yang
Lijian Xu
Hao Sun
Jiaming Song
Shaoting Zhang
ObjD
460
11
0
21 Nov 2023
Tackling Vision Language Tasks Through Learning Inner Monologues
Tackling Vision Language Tasks Through Learning Inner MonologuesAAAI Conference on Artificial Intelligence (AAAI), 2023
Diji Yang
Kezhen Chen
Jinmeng Rao
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yujiao Shi
MLLM
254
15
0
19 Aug 2023
Using Visual Cropping to Enhance Fine-Detail Question Answering of
  BLIP-Family Models
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
174
1
0
31 May 2023
Generate then Select: Open-ended Visual Question Answering Guided by
  World Knowledge
Generate then Select: Open-ended Visual Question Answering Guided by World KnowledgeAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xingyu Fu
Shenmin Zhang
Gukyeong Kwon
Pramuditha Perera
Henghui Zhu
...
Zhiguo Wang
Vittorio Castelli
Patrick Ng
Dan Roth
Bing Xiang
241
32
0
30 May 2023
A Symmetric Dual Encoding Dense Retrieval Framework for
  Knowledge-Intensive Visual Question Answering
A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question AnsweringAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Alireza Salemi
Juan Altmayer Pizzorno
Hamed Zamani
166
25
0
26 Apr 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
314
5
0
04 Mar 2023
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learningArtificial Intelligence Review (Artif Intell Rev), 2022
Maria Lymperaiou
Giorgos Stamou
543
27
0
19 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video CaptioningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
194
25
0
17 Nov 2022
VLC-BERT: Visual Question Answering with Contextualized Commonsense
  Knowledge
VLC-BERT: Visual Question Answering with Contextualized Commonsense KnowledgeIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Sahithya Ravi
Aditya Chinchure
Leonid Sigal
Renjie Liao
Vered Shwartz
215
48
0
24 Oct 2022
LaKo: Knowledge-driven Visual Question Answering via Late
  Knowledge-to-Text Injection
LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
Zhuo Chen
Yufen Huang
Jiaoyan Chen
Yuxia Geng
Yin Fang
Jeff Z. Pan
Ningyu Zhang
Wen Zhang
263
50
0
26 Jul 2022
Modular and Parameter-Efficient Multimodal Fusion with Prompting
Modular and Parameter-Efficient Multimodal Fusion with PromptingFindings (Findings), 2022
Sheng Liang
Mengjie Zhao
Hinrich Schütze
199
53
0
15 Mar 2022
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric
  Outside-Knowledge Visual Question Answering
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering
Feng Gao
Q. Ping
Govind Thattai
Aishwarya N. Reganti
Yingting Wu
Premkumar Natarajan
193
18
0
14 Jan 2022
1
Page 1 of 1