Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.11338
Cited By
VLDeformer: Vision-Language Decomposed Transformer for Fast Cross-Modal Retrieval
20 October 2021
Lisai Zhang
Hongfa Wu
Qingcai Chen
Yimeng Deng
Zhonghua Li
Dejiang Kong
Zhao Cao
Joanna Siebert
Yunpeng Han
ViT
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VLDeformer: Vision-Language Decomposed Transformer for Fast Cross-Modal Retrieval"
6 / 6 papers shown
Title
An Enhanced Large Language Model For Cross Modal Query Understanding System Using DL-KeyBERT Based CAZSSCL-MPGPT
Shreya Singh
36
0
0
24 Feb 2025
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis
Mikel Williams-Lekuona
Georgina Cosma
32
0
0
29 Jul 2024
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Chunhui Zhang
Xin Sun
Li Liu
Yiqian Yang
Qiong Liu
Xiaoping Zhou
Yanfeng Wang
33
15
0
07 Jul 2023
FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training
Yunpeng Han
Lisai Zhang
Qingcai Chen
Zhijian Chen
Zhonghua Li
Jianxin Yang
Zhao Cao
AI4TS
VLM
21
11
0
11 Apr 2023
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
202
331
0
05 Jan 2021
1