Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.09737
Cited By
Position-guided Text Prompt for Vision-Language Pre-training
19 December 2022
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Position-guided Text Prompt for Vision-Language Pre-training"
26 / 26 papers shown
Title
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
66
0
0
26 Feb 2025
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Ting Yu
Kunhao Fu
Shuhui Wang
Qingming Huang
Jun Yu
41
0
0
12 Oct 2024
MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model
Yiwen Ye
Ziyang Chen
Jianpeng Zhang
Yutong Xie
Yong Xia
MedIm
35
3
0
08 Oct 2024
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang
Pan Zhang
Xiao-wen Dong
Yuhang Zang
Jiaqi Wang
CLIP
VLM
26
106
0
22 Mar 2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfen Yuan
Qinghao Ye
...
Mingshi Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
VLM
22
0
0
01 Mar 2024
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen
Tiancheng Zhao
Mingwei Zhu
Jianwei Yin
VLM
ObjD
65
11
0
22 Dec 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
11
4
0
14 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
33
7
0
11 Dec 2023
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation
Bangyan He
Xiaojun Jia
Siyuan Liang
Tianrui Lou
Yang Liu
Xiaochun Cao
AAML
VLM
19
23
0
08 Dec 2023
APoLLo: Unified Adapter and Prompt Learning for Vision Language Models
Sanjoy Chowdhury
Sayan Nag
Dinesh Manocha
VLM
19
17
0
04 Dec 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
17
0
0
20 Oct 2023
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
19
7
0
16 Oct 2023
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Ye-Ting Chen
Siyu Zhang
Yaoru Sun
Weijian Liang
Haoran Wang
33
0
0
18 Aug 2023
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks
David Junhao Zhang
Mutian Xu
Chuhui Xue
Wenqing Zhang
Xiaoguang Han
Song Bai
Mike Zheng Shou
DiffM
32
6
0
13 Aug 2023
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
17
12
0
06 Aug 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
26
86
0
11 Jul 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
15
5
0
06 Jul 2023
Review of Large Vision Models and Visual Prompt Engineering
Jiaqi Wang
Zheng Liu
Lin Zhao
Zihao Wu
Chong Ma
...
Bao Ge
Yixuan Yuan
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
LRM
51
141
0
03 Jul 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
13
26
0
28 May 2023
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
24
21
0
09 Oct 2022
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
F. Khan
VPVLM
VLM
186
521
0
06 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,010
0
28 Jan 2022
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
194
218
0
24 Sep 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
252
157
0
02 Jan 2021
1