Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04356
Cited By
Fine-Grained Visual Prompting
7 June 2023
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjD
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fine-Grained Visual Prompting"
50 / 52 papers shown
Title
Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
Lucas Choi
Ross Greer
VLM
11
0
0
14 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
52
0
0
03 May 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas J. Guibas
Minhyuk Sung
LRM
41
0
0
24 Apr 2025
Visual and textual prompts for enhancing emotion recognition in video
Zhifeng Wang
Qixuan Zhang
Peter Zhang
Wenjia Niu
Kaihao Zhang
Ramesh Sankaranarayana
Sabrina Caldwell
Tom Gedeon
39
0
0
24 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
49
0
0
22 Apr 2025
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation
Junyuan Fang
Zihan Wang
Y. Zhang
Shuzhe Wang
Iaroslav Melekhov
Juho Kannala
VLM
40
0
0
20 Apr 2025
Exploring Multimodal Prompt for Visualization Authoring with Large Language Models
Zhen Wen
Luoxuan Weng
Yinghao Tang
Runjin Zhang
Y. Liu
Bo Pan
Minfeng Zhu
Wei Chen
LRM
19
0
0
18 Apr 2025
URECA: Unique Region Caption Anything
Sangbeom Lim
J. Kim
Heeji Yoon
Jaewoo Jung
Seungryong Kim
29
0
0
07 Apr 2025
Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval
Yuji Nozawa
Yu Lin
Kazumoto Nakamura
Youyang Ng
38
0
0
02 Apr 2025
Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models
Zhaoxin Li
Zhang Xi-Jia
Batuhan Altundas
Letian Chen
Rohan R. Paleja
Matthew C. Gombolay
OffRL
41
0
0
20 Mar 2025
Visual Position Prompt for MLLM based Visual Grounding
Wei Tang
Yanpeng Sun
Qinying Gu
Zechao Li
VLM
45
0
0
19 Mar 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li
Renping Zhou
Jiawei Zhou
Yingwei Song
Johannes Herter
Minghan Qin
Gao Huang
Hanspeter Pfister
3DGS
VLM
66
0
0
13 Mar 2025
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
Shunqi Mao
Chaoyi Zhang
Weidong Cai
MLLM
81
0
0
13 Mar 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
LRM
68
0
0
24 Feb 2025
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models
Yeyuan Wang
D. Gao
Bin Li
Rujiao Long
Lei Yi
Xiaoyan Cai
Libin Yang
Jinxia Zhang
Shanqing Yu
Qi Xuan
68
1
0
22 Dec 2024
Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Ruyi Ding
Tong Zhou
Lili Su
A. A. Ding
Xiaolin Xu
Yunsi Fei
AAML
58
1
0
19 Nov 2024
Right this way: Can VLMs Guide Us to See More to Answer Questions?
Li Liu
Diji Yang
Sijia Zhong
Kalyana Suma Sree Tholeti
Lei Ding
Yi Zhang
Leilani H. Gilpin
31
2
0
01 Nov 2024
Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation
Chen Xu
Qiming Huang
Yuqi Hou
Jiangxing Wu
Fan Zhang
Hyung Jin Chang
Jianbo Jiao
30
0
0
11 Oct 2024
G
2
^{2}
2
TR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models
Riya Arora
N. N.
Aman Tambi
Sandeep S. Zachariah
Souvik Chakraborty
Rohan Paul
LM&Ro
28
0
0
10 Oct 2024
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation
Muzhi Zhu
Yang Liu
Zekai Luo
Chenchen Jing
Hao Chen
Guangkai Xu
Xinlong Wang
Chunhua Shen
DiffM
VLM
36
3
0
03 Oct 2024
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai
Lingfeng Yang
Yihao Xu
Zhenhua Feng
Wankou Yang
ObjD
27
9
0
26 Sep 2024
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu
Weihao Yu
Xinchao Wang
VLM
30
6
0
25 Sep 2024
EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Yu-Gang Jiang
18
1
0
25 Sep 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
40
7
0
31 Jul 2024
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
Walid Bousselham
Sofian Chaybouti
Christian Rupprecht
Vittorio Ferrari
Hilde Kuehne
59
0
0
29 Jul 2024
VACoDe: Visual Augmented Contrastive Decoding
Sihyeon Kim
Boryeong Cho
Sangmin Bae
Sumyeong Ahn
SeYoung Yun
34
3
0
26 Jul 2024
Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing
Jun Zhu
Zihao Du
Haotian Xu
Fengbo Lan
Zilong Zheng
Bo Ma
Shengjie Wang
Tao Zhang
36
4
0
12 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIP
VLM
29
2
0
08 Jul 2024
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Changdae Oh
Gyeongdeok Seo
Geunyoung Jung
Zhi-Qi Cheng
Hosik Choi
Jiyoung Jung
Kyungwoo Song
VLM
29
1
0
04 Jul 2024
3D Feature Distillation with Object-Centric Priors
Georgios Tziafas
Yucheng Xu
Zhibin Li
H. Kasaei
18
1
0
26 Jun 2024
Towards Open-World Grasping with Large Vision-Language Models
Georgios Tziafas
H. Kasaei
LM&Ro
LRM
27
11
0
26 Jun 2024
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
S. Linok
T. Zemskova
Svetlana Ladanova
Roman Titkov
Dmitry A. Yudin
Maxim Monastyrny
Aleksei Valenkov
LM&Ro
43
3
0
11 Jun 2024
Learning Visual Prompts for Guiding the Attention of Vision Transformers
Razieh Rezaei
Masoud Jalili Sabet
Jindong Gu
Daniel Rueckert
Philip H. S. Torr
Ashkan Khakzar
19
5
0
05 Jun 2024
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li
Haopeng Li
S. Erfani
Lei Feng
James Bailey
Feng Liu
VLM
27
3
0
05 Jun 2024
Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation
Sidra Aleem
Fangyijie Wang
Mayug Maniparambil
Eric Arazo
J. Dietlmeier
Guénolé Silvestre
Kathleen M. Curran
Noel E. O'Connor
Suzanne Little
VLM
MedIm
27
10
0
09 Apr 2024
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu
Sheng-Yu Huang
Yu-Chiang Frank Wang
34
0
0
25 Mar 2024
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
YiXuan Wu
Yizhou Wang
Shixiang Tang
Wenhao Wu
Tong He
Wanli Ouyang
Jian Wu
Philip H. S. Torr
ObjD
VLM
25
18
0
19 Mar 2024
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Zheng Li
Xiang Li
Xinyi Fu
Xing Zhang
Weiqiang Wang
Shuo Chen
Jian Yang
VLM
27
34
0
05 Mar 2024
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
64
1
0
01 Mar 2024
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition
Lianghui Zhu
Junwei Zhou
Yan Liu
Xin Hao
Wenyu Liu
Xinggang Wang
VLM
31
5
0
22 Feb 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
14
5
0
18 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
23
9
0
03 Jan 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng
Boyu Gou
Jihyung Kil
Huan Sun
Yu-Chuan Su
MLLM
VLM
LLMAG
41
205
0
03 Jan 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
25
82
0
06 Dec 2023
Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation
Francisco Eiras
Kemal Oksuz
Adel Bibi
Philip H. S. Torr
P. Dokania
25
1
0
20 Oct 2023
InstructDET: Diversifying Referring Object Detection with Generalized Instructions
Ronghao Dang
Jiangyan Feng
Haodong Zhang
Chongjian Ge
Lin Song
...
Chengju Liu
Qi Chen
Feng Zhu
Rui Zhao
Yibing Song
ObjD
13
11
0
08 Oct 2023
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Chaoning Zhang
Fachrina Dewi Puspitasari
Sheng Zheng
Chenghao Li
Yu Qiao
...
Caiyan Qin
François Rameau
Lik-Hang Lee
Sung-Ho Bae
Choong Seon Hong
VLM
76
62
0
12 May 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,110
0
28 Jan 2022
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
194
220
0
24 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,249
0
02 Sep 2021
1
2
Next