ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.08589
  4. Cited By
Enhancing Visual Question Answering through Question-Driven Image
  Captions as Prompts

Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts

12 April 2024
Övgü Özdemir
Erdem Akagündüz
ArXivPDFHTML

Papers citing "Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts"

10 / 10 papers shown
Title
VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making
VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making
Mohamed Salim Aissi
Clemence Grislain
Mohamed Chetouani
Olivier Sigaud
Laure Soulier
Nicolas Thome
LRM
37
0
0
19 Mar 2025
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
Zhaoyi Li
Xiaohan Zhao
Dong-Dong Wu
Jiacheng Cui
Zhiqiang Shen
AAML
VLM
69
0
0
13 Mar 2025
MMRL: Multi-Modal Representation Learning for Vision-Language Models
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
VLM
OffRL
50
0
0
11 Mar 2025
A kinetic-based regularization method for data science applications
Abhisek Ganguly
Alessandro Gabbana
Vybhav Rao
Sauro Succi
Santosh Ansumali
39
0
0
06 Mar 2025
A Survey on Data Synthesis and Augmentation for Large Language Models
A Survey on Data Synthesis and Augmentation for Large Language Models
Ke Wang
Jiahui Zhu
Minjie Ren
Z. Liu
Shiwei Li
...
Chenkai Zhang
Xiaoyu Wu
Qiqi Zhan
Qingjie Liu
Yunhong Wang
SyDa
36
15
0
16 Oct 2024
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance
  in Mathematical Reasoning
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
Ayush Singh
Mansi Gupta
Shivank Garg
Abhinav Kumar
Vansh Agrawal
ReLM
LRM
24
0
0
08 Oct 2024
MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA
  with LLM and MLLM Integration
MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration
Lai Wei
Wenkai Wang
Xiaoyu Shen
Yu Xie
Zhihao Fan
Xiaojin Zhang
Zhongyu Wei
Wei Chen
16
4
0
06 Oct 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
164
401
0
10 Sep 2021
1