ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.02469
  4. Cited By
Vision-Language Models for Medical Report Generation and Visual Question
  Answering: A Review
v1v2 (latest)

Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review

4 March 2024
Iryna Hartsock
Ghulam Rasool
ArXiv (abs)PDFHTMLGithub

Papers citing "Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review"

50 / 64 papers shown
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Shojiro Yamabe
Futa Waseda
Daiki Shiono
Tsubasa Takahashi
DiffMMLLMVLM
298
1
0
03 Dec 2025
On the Utility of Foundation Models for Fast MRI: Vision-Language-Guided Image Reconstruction
On the Utility of Foundation Models for Fast MRI: Vision-Language-Guided Image Reconstruction
Ruimin Feng
Xingxin He
Ronald Mercer
Zachary Stewart
Fang Liu
DiffM
238
0
0
24 Nov 2025
L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention
L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention
Yuliang Zhan
Xinyu Tang
Han Wan
Jian Li
Ji-Rong Wen
Hao Sun
LRM
138
1
0
22 Nov 2025
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications
Gagan Raj Gupta
Anshul Kumar
Manish Rai
Apu Chakraborty
Ashutosh Modi
...
Soumajit Pramanik
Moyank Giri
Yashwanth Holla
Sunny Kumar
M. V. Kiran Sooraj
187
0
0
17 Nov 2025
Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling
Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling
Adam Hazimeh
Ke Wang
Mark Collier
Gilles Baechler
E. Kokiopoulou
Pascal Frossard
DiffM
281
0
0
17 Nov 2025
Medical Report Generation: A Hierarchical Task Structure-Based Cross-Modal Causal Intervention Framework
Medical Report Generation: A Hierarchical Task Structure-Based Cross-Modal Causal Intervention Framework
Yucheng Song
Yifan Ge
Junhao Li
Zhining Liao
Zhifang Liao
118
0
0
04 Nov 2025
Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
Jinhua Yin
Peiru Yang
Chen Yang
Huili Wang
Zhiyang Hu
Shangguang Wang
Yongfeng Huang
Tao Qi
229
1
0
03 Nov 2025
HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology
HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology
Sandeep Vissapragada
Vikrant Sahu
Gagan Raj Gupta
Vandita Singh
112
0
0
28 Oct 2025
Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents
Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents
Guangfu Guo
Xiaoqian Lu
Yue Feng
LRM
225
1
0
21 Oct 2025
Fine-Tuning MedGemma for Clinical Captioning to Enhance Multimodal RAG over Malaysia CPGs
Fine-Tuning MedGemma for Clinical Captioning to Enhance Multimodal RAG over Malaysia CPGs
Lee Qi Zun
Mohamad Zulhilmi Bin Abdul Halim
Goh Man Fye
177
1
0
17 Oct 2025
ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment
ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment
Ruochen Li
Jun Li
Bailiang Jian
Kun Yuan
Youxiang Zhu
192
3
0
30 Sep 2025
Video Panels for Long Video Understanding
Video Panels for Long Video Understanding
Lars Doorenbos
Federico Spurio
Juergen Gall
VLM
206
3
0
28 Sep 2025
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering
Ziqing Wang
Chengsheng Mao
Xiaole Wen
Yuan Luo
Kaize Ding
MedIm
162
0
0
26 Sep 2025
RAU: Reference-based Anatomical Understanding with Vision Language Models
RAU: Reference-based Anatomical Understanding with Vision Language Models
Yiwei Li
Y. Liu
Jiaqi Guo
Lin Zhao
Zheyuan Zhang
Xiao Chen
Boris Mailhe
Ankush Mukherjee
Terrence Chen
Shanhui Sun
166
2
0
26 Sep 2025
EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models
EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models
Botai Yuan
Yutian Zhou
Yingjie Wang
Fushuo Huo
Yongcheng Jing
...
Zhiqi Shen
Ziwei Liu
Tianwei Zhang
J. Yang
Dacheng Tao
LM&MAELM
297
2
0
24 Sep 2025
Eye Gaze Tells You Where to Compute: Gaze-Driven Efficient VLMs
Eye Gaze Tells You Where to Compute: Gaze-Driven Efficient VLMs
Qinyu Chen
Jiawen Qi
148
0
0
20 Sep 2025
Intelligent Healthcare Imaging Platform: A VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation
Intelligent Healthcare Imaging Platform: A VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation
Samer Al-Hamadani
LM&MA
171
0
0
16 Sep 2025
Analysis of Blood Report Images Using General Purpose Vision-Language Models
Analysis of Blood Report Images Using General Purpose Vision-Language Models
Nadia Bakhsheshi
Hamid Beigy
VLM
88
0
0
07 Sep 2025
XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
Masato Ito
Kaito Tanaka
Keisuke Matsuda
Aya Nakayama
181
1
0
21 Aug 2025
Hallucinations in medical devices
Hallucinations in medical devices
Jason Granstedt
Prabhat Kc
Rucha Deshpande
Victor Garcia
Aldo Badano
214
7
0
18 Aug 2025
M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following
M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following
Ruirui Gao
Emily Johnson
Bowen Tan
Yanfei Qian
241
1
0
17 Aug 2025
A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering
A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering
Ziruo Yi
Jinyu Liu
Ting Xiao
Mark V. Albert
268
0
0
04 Aug 2025
Your other Left! Vision-Language Models Fail to Identify Relative Positions in Medical Images
Your other Left! Vision-Language Models Fail to Identify Relative Positions in Medical ImagesInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Daniel Wolf
Heiko Hillenhagen
Billurvan Taskin
Alex Bauerle
Meinrad Beer
Michael Götz
Timo Ropinski
160
4
0
01 Aug 2025
A Survey of Multimodal Hallucination Evaluation and Detection
A Survey of Multimodal Hallucination Evaluation and Detection
Zhiyuan Chen
Yuecong Min
Jie M. Zhang
Bei Yan
Jiahao Wang
X. Wang
Shiguang Shan
HILM
468
11
0
25 Jul 2025
Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy
Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy
Sushant Gautam
Pål Halvorsen
Pål Halvorsen
354
8
0
11 Jun 2025
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
Zhiyu Xue
Reza Abbasi-Asl
Ramtin Pedarsani
169
2
0
08 Jun 2025
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
Tianhong Zhou
Yin Xu
Yingtao Zhu
Chuxi Xiao
Haiyang Bian
Lei Wei
Xuegong Zhang
LM&MAVLM
289
6
0
30 May 2025
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li
Runpeng Yu
Xinchao Wang
349
10
0
29 May 2025
Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration
Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration
Mehrdad Fazli
Bowen Wei
Ahmet Sari
Ziwei Zhu
VLM
555
5
0
27 May 2025
An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning
An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning
Andrew Zamai
Nathanael Fijalkow
Boris Mansencal
Laurent Simon
Eloi Navet
Pierrick Coupé
333
2
0
26 May 2025
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering
Y. Chen
Wenjie Xiao
P. R. Bassi
Xinze Zhou
Sezgin Er
Ibrahim Ethem Hamamci
Zongwei Zhou
Yaoyao Liu
ELM
322
9
0
25 May 2025
Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models
Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models
Kai Tang
Jinhao You
Xiuqi Ge
Hanze Li
Yichen Guo
Xiande Huang
MLLM
571
5
0
18 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
1.4K
50
0
05 May 2025
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
Gracjan Góral
Alicja Ziarko
Piotr Miłoś
Michał Nauman
Maciej Wołczyk
Michał Kosiński
LRM
364
2
0
03 May 2025
V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving
V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving
Jannik Lübberstedt
Esteban Rivera
Nico Uhlemann
Markus Lienkamp
MLLM
438
6
0
30 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff
Erblina Purellku
Jakob Hackstein
Jonas Loos
Leo Pinetzki
Lorenz Hufe
AAML
759
5
0
07 Apr 2025
MedM-VL: What Makes a Good Medical LVLM?
MedM-VL: What Makes a Good Medical LVLM?
Yiming Shi
Shaoshuai Yang
Xun Zhu
Haoyu Wang
Xiangling Fu
Chenyi Guo
Ji Wu
VLM
545
3
0
06 Apr 2025
M$^2$IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering
M2^22IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering
Yanshu Li
Yi Cao
Hongyang He
Qisen Cheng
Xiang Fu
Xi Xiao
Tianyang Wang
Ruixiang Tang
VLM
441
1
0
06 Apr 2025
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Mateusz Pach
Shyamgopal Karthik
Quentin Bouniot
Serge Belongie
Zeynep Akata
VLM
735
28
0
03 Apr 2025
LVMed-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation
LVMed-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation
Hao Wang
Shuchang Ye
Jinghao Lin
Usman Naseem
Jinman Kim
LRM
437
2
0
02 Apr 2025
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Zenghui Yuan
Jiawen Shi
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
AAML
513
12
0
20 Mar 2025
A LongFormer-Based Framework for Accurate and Efficient Medical Text Summarization
A LongFormer-Based Framework for Accurate and Efficient Medical Text Summarization
Dan Sun
Jacky He
Hanlu Zhang
Zhen Qi
Hongye Zheng
Xiaokai Wang
MedIm
342
11
0
10 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPAMedical Image Analysis (MedIA), 2025
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
331
0
0
03 Mar 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement LearningInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRMLM&MAVLM
628
149
0
26 Feb 2025
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
Masayo Tomita
Katsuhiko Hayashi
Tomoyuki Kaneko
VLM
221
0
0
24 Feb 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and EthicsInformation Fusion (Inf. Fusion), 2023
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Xiaoshi Zhong
LM&MAAILaw
897
298
0
28 Jan 2025
StreamingRAG: Real-time Contextual Retrieval and Generation Framework
StreamingRAG: Real-time Contextual Retrieval and Generation Framework
Murugan Sankaradas
Ravi K.Rajendran
Srimat T.Chakradhar
270
7
0
23 Jan 2025
More is Less? A Simulation-Based Approach to Dynamic Interactions
  between Biases in Multimodal Models
More is Less? A Simulation-Based Approach to Dynamic Interactions between Biases in Multimodal Models
Mounia Drissi
206
1
0
23 Dec 2024
Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest
  X-ray Images and Electronic Health Records
Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest X-ray Images and Electronic Health Records
Sanjana Gundapaneni
Zhuo Zhi
Miguel R. D. Rodrigues
326
5
0
14 Dec 2024
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model
Yiming Shi
Xun Zhu
Ying Hu
Ying Hu
Chenyi Guo
Ji Wu
Ji Wu
445
15
0
19 Nov 2024
12
Next
Page 1 of 2