Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.13081
Cited By
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
30 January 2023
Chen Chen
Bowen Zhang
Liangliang Cao
Jiguang Shen
Tom Gunter
Albin Madappally Jose
Alexander Toshev
Jonathon Shlens
Ruoming Pang
Yinfei Yang
VLM
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"STAIR: Learning Sparse Text and Image Representation in Grounded Tokens"
12 / 12 papers shown
Title
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
S. Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
96
14
0
03 Dec 2024
Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
Eunji Kim
Kyuhong Shim
Simyung Chang
Sungroh Yoon
CLIP
25
0
0
11 Oct 2024
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models
K. Nakata
Daisuke Miyashita
Youyang Ng
Yasuto Hoshi
J. Deguchi
29
0
0
29 Aug 2024
Unified Lexical Representation for Interpretable Visual-Language Alignment
Yifan Li
Yikai Wang
Yanwei Fu
Dongyu Ru
Zheng-Wei Zhang
Tong He
VLM
27
3
0
25 Jul 2024
Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control
Thong Nguyen
Mariya Hendriksen
Andrew Yates
Maarten de Rijke
40
7
0
27 Feb 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Usha Bhalla
Alexander X. Oesterling
Suraj Srinivas
Flavio du Pin Calmon
Himabindu Lakkaraju
34
35
0
16 Feb 2024
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
18
7
0
13 Jun 2023
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Liangliang Cao
Bowen Zhang
Chen Chen
Yinfei Yang
Xianzhi Du
Wen‐Cheng Zhang
Zhiyun Lu
Yantao Zheng
CLIP
VLM
19
15
0
08 May 2023
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
Thibault Formal
Carlos Lassance
Benjamin Piwowarski
S. Clinchant
194
186
0
21 Sep 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,081
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,689
0
11 Feb 2021
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
225
3,672
0
28 Feb 2017
1