Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,411 papers shown
Title
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
Dawit Mureja Argaw
Fabian Caba Heilbron
Joon-Young Lee
Markus Woodson
In So Kweon
VGen
37
22
0
20 Jul 2022
ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model
Rao Fu
Xiaoyu Zhan
Yiwen Chen
Daniel E. Ritchie
Srinath Sridhar
24
78
0
19 Jul 2022
Don't Stop Learning: Towards Continual Learning for the CLIP Model
Yuxuan Ding
Lingqiao Liu
Chunna Tian
Jingyuan Yang
Haoxuan Ding
CLL
VLM
KELM
11
50
0
19 Jul 2022
Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift
Ananya Kumar
Tengyu Ma
Percy Liang
Aditi Raghunathan
UQCV
OODD
OOD
24
38
0
18 Jul 2022
Zero-Shot Temporal Action Detection via Vision-Language Prompting
Sauradip Nag
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
VLM
20
65
0
17 Jul 2022
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Ming Yan
Ji Zhang
Rongrong Ji
CLIP
VLM
10
266
0
15 Jul 2022
Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation
Lars Schmarje
Vasco Grossmann
Claudius Zelenka
S. Dippel
R. Kiko
...
M. Pastell
J. Stracke
A. Valros
N. Volkmann
Reinahrd Koch
26
34
0
13 Jul 2022
Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling
Tung Nguyen
Aditya Grover
BDL
UQCV
19
99
0
09 Jul 2022
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
H. Rasheed
Muhammad Maaz
Muhammad Uzair Khattak
Salman Khan
F. Khan
ObjD
VLM
23
151
0
07 Jul 2022
FewSOL: A Dataset for Few-Shot Object Learning in Robotic Environments
P. JishnuJaykumar
Yu-Wei Chao
Yu Xiang
17
11
0
06 Jul 2022
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning
Yuheng Lu
Chenfeng Xu
Xi Wei
Xiaodong Xie
M. Tomizuka
Kurt Keutzer
Shanghang Zhang
3DPC
13
20
0
05 Jul 2022
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation
Geondo Park
Jaehong Yoon
H. Zhang
Xingge Zhang
S. Hwang
Yonina C. Eldar
MQ
13
1
0
04 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
87
93
0
04 Jul 2022
Can Language Understand Depth?
Renrui Zhang
Ziyao Zeng
Ziyu Guo
Yafeng Li
VLM
MDE
13
71
0
03 Jul 2022
Divert More Attention to Vision-Language Tracking
Mingzhe Guo
Zhipeng Zhang
Heng Fan
Li Jing
19
53
0
03 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
25
2
0
02 Jul 2022
ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022
Na Liu
Xiaohan Wang
Xiaobo Li
Yi Yang
Yueting Zhuang
15
18
0
01 Jul 2022
(Un)likelihood Training for Interpretable Embedding
Jiaxin Wu
Chong-Wah Ngo
W. Chan
Zhijian Hou
12
2
0
01 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
14
16
0
01 Jul 2022
Measuring Forgetting of Memorized Training Examples
Matthew Jagielski
Om Thakkar
Florian Tramèr
Daphne Ippolito
Katherine Lee
...
Eric Wallace
Shuang Song
Abhradeep Thakurta
Nicolas Papernot
Chiyuan Zhang
TDI
29
102
0
30 Jun 2022
Distilling Model Failures as Directions in Latent Space
Saachi Jain
Hannah Lawrence
Ankur Moitra
A. Madry
16
88
0
29 Jun 2022
LViT: Language meets Vision Transformer in Medical Image Segmentation
Zihan Li
Yunxiang Li
Qingde Li
Puyang Wang
Dazhou Guo
Le Lu
D. Jin
You Zhang
Qingqi Hong
VLM
MedIm
57
131
0
29 Jun 2022
Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss
Andrew Koh
Chng Eng Siong
18
1
0
29 Jun 2022
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
Arjun Majumdar
Gunjan Aggarwal
Bhavika Devnani
Judy Hoffman
Dhruv Batra
LM&Ro
147
149
0
24 Jun 2022
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Peixian Chen
Kekai Sheng
Mengdan Zhang
Mingbao Lin
Yunhang Shen
Shaohui Lin
Bo Ren
Ke Li
VLM
ObjD
23
27
0
22 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
82
1,061
0
22 Jun 2022
Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation
Shengyao Zhuang
Houxing Ren
Linjun Shou
Jian Pei
Ming Gong
Guido Zuccon
Daxin Jiang
22
64
0
21 Jun 2022
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problems
Giannis Daras
Y. Dagan
A. Dimakis
C. Daskalakis
BDL
21
15
0
18 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
22
130
0
18 Jun 2022
Landscape Learning for Neural Network Inversion
Ruoshi Liu
Chen-Guang Mao
Purva Tendulkar
Hongya Wang
Carl Vondrick
16
8
0
17 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
20
42
0
17 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
45
391
0
17 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
42
343
0
17 Jun 2022
Rectify ViT Shortcut Learning by Visual Saliency
Chong Ma
Lin Zhao
Yuzhong Chen
David Liu
Xi Jiang
Tuo Zhang
Xintao Hu
Dinggang Shen
Dajiang Zhu
Tianming Liu
ViT
20
20
0
17 Jun 2022
Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images
Jiyeon Han
Hwanil Choi
Yunjey Choi
Jae Hyun Kim
Jung-Woo Ha
Jaesik Choi
EGVM
10
31
0
17 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
22
225
0
16 Jun 2022
Patch-level Representation Learning for Self-supervised Vision Transformers
Sukmin Yun
Hankook Lee
Jaehyung Kim
Jinwoo Shin
ViT
13
63
0
16 Jun 2022
Disentangling visual and written concepts in CLIP
Joanna Materzyñska
Antonio Torralba
David Bau
CoGe
12
46
0
15 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
17
123
0
15 Jun 2022
A Meta-Analysis of Distributionally-Robust Models
Ben Feuer
Ameya Joshi
C. Hegde
OOD
VLM
17
3
0
15 Jun 2022
Zero-shot object goal visual navigation
Qianfan Zhao
Lu Zhang
Bin He
Hong Qiao
Zhi-yong Liu
17
37
0
15 Jun 2022
Differentiable Top-k Classification Learning
Felix Petersen
Hilde Kuehne
Christian Borgelt
Oliver Deussen
41
28
0
15 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
18
81
0
14 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan N. Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
25
148
0
14 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
25
232
0
14 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
13
86
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
518
0
13 Jun 2022
Bootstrapping Multi-view Representations for Fake News Detection
Qichao Ying
Xiaoxiao Hu
Yangming Zhou
Zhenxing Qian
Dan Zeng
Shiming Ge
13
45
0
12 Jun 2022
Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers
Ran Liu
Mehdi Azabou
M. Dabagia
Jingyun Xiao
Eva L. Dyer
AI4CE
27
19
0
10 Jun 2022
Neural Prompt Search
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
VPVLM
VLM
16
143
0
09 Jun 2022
Previous
1
2
3
...
161
162
163
...
167
168
169
Next