Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.07991
Cited By
LiT: Zero-Shot Transfer with Locked-image text Tuning
15 November 2021
Xiaohua Zhai
Xiao Wang
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LiT: Zero-Shot Transfer with Locked-image text Tuning"
34 / 84 papers shown
Title
A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
J. Allingham
Jie Jessie Ren
Michael W. Dusenberry
Xiuye Gu
Yin Cui
Dustin Tran
J. Liu
Balaji Lakshminarayanan
LLMAG
VLM
8
32
0
13 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
32
562
0
10 Feb 2023
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
19
5
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
24
11
0
17 Jan 2023
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
15
97
0
16 Jan 2023
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Muhammad Ferjad Naeem
Muhammad Gul Zain Ali Khan
Yongqin Xian
Muhammad Zeshan Afzal
D. Stricker
Luc Van Gool
F. Tombari
VLM
22
51
0
05 Dec 2022
Context-Aware Robust Fine-Tuning
Xiaofeng Mao
YueFeng Chen
Xiaojun Jia
Rong Zhang
Hui Xue
Zhao Li
VLM
CLIP
17
23
0
29 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
19
2
0
01 Nov 2022
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Jon Almazán
ByungSoo Ko
Geonmo Gu
Diane Larlus
Yannis Kalantidis
ObjD
VLM
28
7
0
05 Oct 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
16
46
0
26 Jul 2022
Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety
Puja Trivedi
Danai Koutra
Jayaraman J. Thiagarajan
AAML
16
0
0
26 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
14
16
0
01 Jul 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
17
123
0
15 Jun 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Tuan Dinh
Jy-yong Sohn
Shashank Rajput
Timothy Ossowski
Yifei Ming
Junjie Hu
Dimitris Papailiopoulos
Kangwook Lee
8
0
0
23 May 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Xiao Wang
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
8
304
0
12 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
42
1,249
0
04 May 2022
Better plain ViT baselines for ImageNet-1k
Lucas Beyer
Xiaohua Zhai
Alexander Kolesnikov
ViT
VLM
11
111
0
03 May 2022
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)
Alex Fang
Gabriel Ilharco
Mitchell Wortsman
Yu Wan
Vaishaal Shankar
Achal Dave
Ludwig Schmidt
VLM
OOD
9
138
0
03 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
21
155
0
30 Apr 2022
Large-scale Bilingual Language-Image Contrastive Learning
ByungSoo Ko
Geonmo Gu
VLM
17
14
0
28 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
20
16
0
27 Mar 2022
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
S. Gadre
Mitchell Wortsman
Gabriel Ilharco
Ludwig Schmidt
Shuran Song
CLIP
LM&Ro
25
140
0
20 Mar 2022
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision
Yufeng Cui
Lichen Zhao
Feng Liang
Yangguang Li
Jing Shao
UQCV
VLM
CLIP
17
43
0
11 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
39
14
0
01 Mar 2022
Delving Deeper into Cross-lingual Visual Question Answering
Chen Cecilia Liu
Jonas Pfeiffer
Anna Korhonen
Ivan Vulić
Iryna Gurevych
13
8
0
15 Feb 2022
MURAL: Multimodal, Multitask Retrieval Across Languages
Aashi Jain
Mandy Guo
Krishna Srinivasan
Ting-Li Chen
Sneha Kudugunta
Chao Jia
Yinfei Yang
Jason Baldridge
VLM
112
52
0
10 Sep 2021
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
21
679
0
04 Sep 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
29
610
0
18 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
197
307
0
02 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
Previous
1
2