ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.15310
  4. Cited By
Visual Prompting in Multimodal Large Language Models: A Survey

Visual Prompting in Multimodal Large Language Models: A Survey

5 September 2024
Junda Wu
Zhehao Zhang
Yu Xia
Xintong Li
Zhaoyang Xia
Aaron Chang
Tong Yu
Sungchul Kim
Ryan Rossi
Ruiyi Zhang
Subrata Mitra
Dimitris N. Metaxas
Lina Yao
Jingbo Shang
Julian McAuley
    VLMLRM
ArXiv (abs)PDFHTML

Papers citing "Visual Prompting in Multimodal Large Language Models: A Survey"

31 / 31 papers shown
Title
ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning
ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning
Juntao Jian
Yi-Lin Wei
Chengjie Mou
Yuhao Lin
Xing Zhu
Yujun Shen
Wei-Shi Zheng
Ruizhen Hu
106
0
0
17 Nov 2025
RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning
RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning
Jiahe Song
C. Wang
Bowen Jiang
Y Samuel Wang
Hao Zheng
...
Y. Wang
Lijun Wu
Jiang Wu
Qian Yu
Conghui He
88
0
0
04 Nov 2025
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
Wenqi Liang
Gan Sun
Yao He
Jiahua Dong
Suyan Dai
Ivan Laptev
Salman Khan
Yang Cong
LM&Ro3DVVLM
158
0
0
03 Nov 2025
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
Aodi Wu
Xubo Luo
60
0
0
28 Oct 2025
Video Panels for Long Video Understanding
Video Panels for Long Video Understanding
Lars Doorenbos
Federico Spurio
Juergen Gall
VLM
83
0
0
28 Sep 2025
Abductive Logical Rule Induction by Bridging Inductive Logic Programming and Multimodal Large Language Models
Abductive Logical Rule Induction by Bridging Inductive Logic Programming and Multimodal Large Language Models
Yifei Peng
Yaoli Liu
Enbo Xia
Yu Jin
Wang-Zhou Dai
Zhong Ren
Yao-Xiang Ding
Kun Zhou
76
0
0
26 Sep 2025
From Drone Imagery to Livability Mapping: AI-powered Environment Perception in Rural China
From Drone Imagery to Livability Mapping: AI-powered Environment Perception in Rural China
Weihuan Deng
Yaofu Huang
Luan Chen
Xun Li
Yu Gu
Yao Yao
56
0
0
29 Aug 2025
Labels or Input? Rethinking Augmentation in Multimodal Hate Detection
Labels or Input? Rethinking Augmentation in Multimodal Hate Detection
Sahajpreet Singh
Rongxin Ouyang
Subhayan Mukerjee
Kokil Jaidka
VLM
84
0
0
15 Aug 2025
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
Yuhao Sun
Yihua Zhang
Gaowen Liu
Hongtao Xie
Sijia Liu
92
0
0
13 Aug 2025
A Survey on Video Temporal Grounding with Multimodal Large Language Model
A Survey on Video Temporal Grounding with Multimodal Large Language ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Yue Yu
Wei Liu
Y. Liu
Meng-yang Liu
Liqiang Nie
Zhouchen Lin
C. Chen
AI4TSVLMLRM
133
6
0
07 Aug 2025
VPN: Visual Prompt Navigation
VPN: Visual Prompt Navigation
Shuo Feng
Zihan Wang
Yuchen Li
Rui Kong
Hengyi Cai
Shuaiqiang Wang
Gim Hee Lee
Piji Li
Shuqiang Jiang
157
0
0
03 Aug 2025
PDB-Eval: An Evaluation of Large Multimodal Models for Description and Explanation of Personalized Driving Behavior
PDB-Eval: An Evaluation of Large Multimodal Models for Description and Explanation of Personalized Driving Behavior
Junda Wu
J. Echterhoff
Kyungtae Han
Lingxi Li
Rohit Gupta
Julian McAuley
93
3
0
24 Jul 2025
Loss-Oriented Ranking for Automated Visual Prompting in LVLMs
Loss-Oriented Ranking for Automated Visual Prompting in LVLMs
Yuan Zhang
Chun-Kai Fan
Tao Huang
Ming Lu
Sicheng Yu
Junwen Pan
Kuan Cheng
Qi She
Shanghang Zhang
VLMLRM
170
2
0
19 Jun 2025
Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Ruiyang Zhang
Hu Zhang
Hao Fei
Zhedong Zheng
UQCV
210
0
0
09 Jun 2025
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction
Zesheng Ye
C. Cai
Ruijiang Dong
Jianzhong Qi
Bingquan Shen
Pin-Yu Chen
Feng Liu
503
1
0
05 Jun 2025
Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs
Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs
Fangrui Zhu
Hanhui Wang
Yiming Xie
Jing Gu
Tianye Ding
Jianwei Yang
Huaizu Jiang
3DVLRM
360
0
0
04 Jun 2025
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Chenbin Pan
Wenbin He
Zhengzhong Tu
Liu Ren
LRMVLM
443
2
0
29 May 2025
MLLMs are Deeply Affected by Modality Bias
MLLMs are Deeply Affected by Modality Bias
Xu Zheng
Chenfei Liao
Yuqian Fu
Kaiyu Lei
Yuanhuiyi Lyu
...
Yu Jiang
Andrii Zadaianchuk
Dacheng Tao
Luc Van Gool
Xuming Hu
260
10
0
24 May 2025
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
Ingeol Baek
Hwan Chang
Sunghyun Ryu
Hwanhee Lee
133
0
0
21 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
225
2
0
03 May 2025
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Yikun Ji
Y. Hong
Jiahui Zhan
H. Chen
Jun Lan
Huijia Zhu
Weiqiang Wang
Guang Dai
Jianfu Zhang
MLLMLRM
420
4
0
19 Apr 2025
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
Masayo Tomita
Katsuhiko Hayashi
Tomoyuki Kaneko
VLM
141
0
0
24 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Vector-ICL: In-context Learning with Continuous Vector RepresentationsInternational Conference on Learning Representations (ICLR), 2024
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
372
10
0
21 Feb 2025
Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Junda Wu
Yuxin Xiong
Xintong Li
Yu Xia
Ruoyu Wang
...
Sungchul Kim
Ryan Rossi
Lina Yao
Jingbo Shang
Julian McAuley
CLLVLM
282
2
0
17 Feb 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
695
116
0
03 Jan 2025
Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
Qingbin Liu
Yumeng Li
Boyuan Xiao
Yichang Jian
Ziang Qin
Tianjia Shao
Yao-Xiang Ding
Kun Zhou
LRMMLLM
439
4
0
27 Nov 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLMVGen
230
12
0
15 Jun 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLMLRM
564
302
0
29 Apr 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
228
24
0
25 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Shiyang Feng
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Jiaming Song
VLM
327
81
0
29 Mar 2024
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual PromptsAAAI Conference on Artificial Intelligence (AAAI), 2023
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
573
264
0
09 Nov 2023
1