ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10948
  4. Cited By
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

22 March 2024
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
ArXivPDFHTML

Papers citing "Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery"

15 / 15 papers shown
Title
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I Roumeliotis
Manoj Karkee
LM&Ro
63
0
0
07 May 2025
Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement
Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement
Long Bai
Boyi Ma
Ruohan Wang
Guankun Wang
Beilei Cui
...
Mobarakol Islam
Zhe Min
Jiewen Lai
Nassir Navab
Hongliang Ren
38
0
0
03 May 2025
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Y. Zhang
MLLM
VLM
39
1
0
01 Mar 2025
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
Ufaq Khan
Umair Nawaz
A. Qayyum
Shazad Ashraf
Muhammad Bilal
Junaid Qadir
65
0
0
24 Feb 2025
SurgSora: Decoupled RGBD-Flow Diffusion Model for Controllable Surgical
  Video Generation
SurgSora: Decoupled RGBD-Flow Diffusion Model for Controllable Surgical Video Generation
Tong Chen
Shuya Yang
Junyi Wang
Long Bai
Hongliang Ren
Luping Zhou
VGen
MedIm
67
1
0
18 Dec 2024
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic
  Surgical Video-Language Pretraining
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu
Kun Yuan
Yaling Shen
Feilong Tang
Xiaohao Xu
...
Jin Ye
N. Padoy
Nassir Navab
Junjun He
Zongyuan Ge
VLM
CLIP
72
8
0
23 Nov 2024
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object
  Hallucination in Large Vision-Language Models
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen
Tianshu Zhang
S. Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLM
VLM
90
1
0
22 Nov 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large
  Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Guankun Wang
Han Xiao
Huxin Gao
Renrui Zhang
Long Bai
Xiaoxiao Yang
Zhen Li
Hongsheng Li
Hongliang Ren
28
2
0
10 Oct 2024
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust
  Visual Question-Localized Answering in Robotic Surgery
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
Long Bai
Guankun Wang
Mobarakol Islam
Lalithkumar Seenivasan
An-Chi Wang
Hongliang Ren
32
11
0
09 Aug 2024
GP-VLS: A general-purpose vision language model for surgery
GP-VLS: A general-purpose vision language model for surgery
Samuel Schmidgall
Joseph Cho
C. Zakka
W. Hiesinger
LM&MA
44
3
0
27 Jul 2024
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance
  in Insurance
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
Chenwei Lin
Hanjia Lyu
Xian Xu
Jiebo Luo
21
1
0
13 Jun 2024
VM-UNet: Vision Mamba UNet for Medical Image Segmentation
VM-UNet: Vision Mamba UNet for Medical Image Segmentation
Jiacheng Ruan
Suncheng Xiang
Mamba
59
241
0
04 Feb 2024
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
85
68
0
05 Dec 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
1