ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.11053
  4. Cited By
Surgical-VQA: Visual Question Answering in Surgical Scenes using
  Transformer

Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer

22 June 2022
Lalithkumar Seenivasan
Mobarakol Islam
Adithya K. Krishna
Hongliang Ren
    MedIm
ArXivPDFHTML

Papers citing "Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer"

26 / 26 papers shown
Title
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
89
0
0
29 Apr 2025
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Songtao Jiang
Yuan Wang
Sibo Song
Yuhang Zhang
Zijie Meng
Bohan Lei
Jian Wu
Jimeng Sun
Zuozhu Liu
MedIm
VLM
39
0
0
20 Apr 2025
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
Erjian Guo
Zhen Zhao
Zicheng Wang
Tong Chen
Yunyi Liu
Luping Zhou
DiffM
MedIm
55
0
0
24 Mar 2025
SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence
Chang Han Low
Ziyue Wang
Tianyi Zhang
Zhitao Zeng
Zhu Zhuo
E. Mazomenos
Yueming Jin
LRM
48
1
0
13 Mar 2025
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
Ufaq Khan
Umair Nawaz
A. Qayyum
Shazad Ashraf
Muhammad Bilal
Junaid Qadir
76
0
0
24 Feb 2025
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry
Wenjun Hou
Yi Cheng
Kaishuai Xu
Yan Hu
Wenjie Li
Jiang-Dong Liu
33
0
0
17 Nov 2024
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large
  Language and Vision Models
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models
Juseong Jin
Chang Wook Jeong
27
3
0
13 Oct 2024
VidLPRO: A $\underline{Vid}$eo-$\underline{L}$anguage
  $\underline{P}$re-training Framework for $\underline{Ro}$botic and
  Laparoscopic Surgery
VidLPRO: A Vid‾\underline{Vid}Vid​eo-L‾\underline{L}L​anguage P‾\underline{P}P​re-training Framework for Ro‾\underline{Ro}Ro​botic and Laparoscopic Surgery
Mohammadmahdi Honarmand
Muhammad Abdullah Jamal
Omid Mohareri
60
1
0
07 Sep 2024
LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured
  Surgical Video Learning
LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning
Jiajie Li
Garrett C Skinner
Gene Yang
Brian R Quaranto
Steven D. Schwaitzberg
Peter C W Kim
Jinjun Xiong
38
10
0
15 Aug 2024
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust
  Visual Question-Localized Answering in Robotic Surgery
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
Long Bai
Guankun Wang
Mobarakol Islam
Lalithkumar Seenivasan
An-Chi Wang
Hongliang Ren
49
13
0
09 Aug 2024
GP-VLS: A general-purpose vision language model for surgery
GP-VLS: A general-purpose vision language model for surgery
Samuel Schmidgall
Joseph Cho
C. Zakka
W. Hiesinger
LM&MA
49
5
0
27 Jul 2024
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering
  in Pituitary Surgery
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery
Runlong He
Mengya Xu
Adrito Das
Danyal Z. Khan
Sophia Bano
Hani J. Marcus
Danail Stoyanov
Matthew J. Clarkson
Mobarakol Islam
45
7
0
22 May 2024
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
46
14
0
22 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question
  Answering: A Review
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
46
62
0
04 Mar 2024
Multitask Learning in Minimally Invasive Surgical Vision: A Review
Multitask Learning in Minimally Invasive Surgical Vision: A Review
Oluwatosin O. Alabi
Tom Kamiel Magda Vercauteren
Miaojing Shi
31
3
0
16 Jan 2024
Advancing Surgical VQA with Scene Graph Knowledge
Advancing Surgical VQA with Scene Graph Knowledge
Kun Yuan
Manasi Kattel
Joël L. Lavanchy
Nassir Navab
V. Srivastav
N. Padoy
25
16
0
15 Dec 2023
Towards Perceiving Small Visual Details in Zero-shot Visual Question
  Answering with Multimodal LLMs
Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
37
2
0
24 Oct 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
32
20
0
27 Jul 2023
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
  and Comprehensive Framework
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework
Jingxuan Wei
Cheng Tan
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Bihui Yu
R. Guo
Stan Z. Li
LRM
34
8
0
24 Jul 2023
Revisiting Distillation for Continual Learning on Visual Question
  Localized-Answering in Robotic Surgery
Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Hongliang Ren
29
18
0
22 Jul 2023
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual
  Question Localized-Answering in Robotic Surgery
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Hongliang Ren
25
20
0
11 Jul 2023
Generalizing Surgical Instruments Segmentation to Unseen Domains with
  One-to-Many Synthesis
Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis
An-Chi Wang
Mobarakol Islam
Mengya Xu
Hongliang Ren
MedIm
16
0
0
28 Jun 2023
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for
  Visual Question Localized-Answering in Robotic Surgery
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Lalithkumar Seenivasan
Hongliang Ren
28
27
0
19 May 2023
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question
  Answering in Surgery
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Lalithkumar Seenivasan
Mobarakol Islam
Gokul Kannan
Hongliang Ren
19
40
0
19 Apr 2023
EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic
  Videos
EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos
A. P. Twinanda
S. Shehata
Didier Mutter
J. Marescaux
M. de Mathelin
N. Padoy
182
840
0
09 Feb 2016
1