Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,928 papers shown
Title
Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data
Paul Hager
M. Menten
Daniel Rueckert
19
47
0
24 Mar 2023
Category Query Learning for Human-Object Interaction Classification
Chi Xie
Fangao Zeng
Yue Hu
Shuang Liang
Yichen Wei
VLM
24
20
0
24 Mar 2023
Three ways to improve feature alignment for open vocabulary detection
Relja Arandjelović
A. Andonian
A. Mensch
Olivier J. Hénaff
Jean-Baptiste Alayrac
Andrew Zisserman
VLM
ObjD
28
19
0
23 Mar 2023
Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition
Stephanie Milani
Anssi Kanervisto
Karolis Ramanauskas
Sander Schulhoff
Brandon Houghton
...
Vinicius G. Goecks
Nicholas R. Waytowich
David Watkins
J. Miller
Rohin Shah
25
16
0
23 Mar 2023
ReVersion: Diffusion-Based Relation Inversion from Images
Ziqi Huang
Tianxing Wu
Yuming Jiang
Kelvin C. K. Chan
Ziwei Liu
25
65
0
23 Mar 2023
TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
Jiacheng Wei
Hao Wang
Jiashi Feng
Guosheng Lin
Kim-Hui Yap
22
30
0
23 Mar 2023
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
Qifan Yu
Juncheng Li
Yuehua Wu
Siliang Tang
Wei Ji
Yueting Zhuang
25
34
0
23 Mar 2023
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels
Zixuan Ding
Ao Wang
Hui Chen
Q. Zhang
Pengzhang Liu
Yongjun Bao
Weipeng P. Yan
Jungong Han
19
27
0
23 Mar 2023
Explore the Power of Synthetic Data on Few-shot Object Detection
Shaobo Lin
Kun Wang
Xingyu Zeng
Ruili Zhao
27
32
0
23 Mar 2023
Exploring Visual Prompts for Whole Slide Image Classification with Multiple Instance Learning
Yi-Mou Lin
Zhongchen Zhao
Zhengjie Zhu
Lisheng Wang
Kwang-Ting Cheng
Hao Chen
VLM
10
1
0
23 Mar 2023
Keypoint-Guided Optimal Transport
Xiang Gu
Yucheng Yang
Weizhen Zeng
Jian-jun Sun
Zongben Xu
24
1
0
23 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
17
28
0
23 Mar 2023
An Extended Study of Human-like Behavior under Adversarial Training
Paul Gavrikov
J. Keuper
M. Keuper
AAML
26
9
0
22 Mar 2023
MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation
Vitaliy Kinakh
M. Drozdova
S. Voloshynovskiy
27
1
0
21 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
25
1
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
23
30
0
21 Mar 2023
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
Lukas Höllein
Ang Cao
Andrew Owens
Justin Johnson
Matthias Nießner
DiffM
30
177
0
21 Mar 2023
Multi-modal Prompting for Low-Shot Temporal Action Localization
Chen Ju
Zeqian Li
Peisen Zhao
Ya-Qin Zhang
Xiaopeng Zhang
Qi Tian
Yanfeng Wang
Weidi Xie
27
18
0
21 Mar 2023
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models
Weijia Wu
Yuzhong Zhao
Mike Zheng Shou
Hong Zhou
Chunhua Shen
31
140
0
21 Mar 2023
Detecting the open-world objects with the help of the Brain
Shuailei Ma
Yuefeng Wang
Ying-yu Wei
Peihao Chen
Zhixiang Ye
Jiaqi Fan
Enming Zhang
Thomas H. Li
VLM
ObjD
16
2
0
21 Mar 2023
CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via Dialogue
Xing Cui
Zekun Li
Peipei Li
Yibo Hu
Hailin Shi
Zhaofeng He
23
7
0
20 Mar 2023
Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models
René Haas
Inbar Huberman-Spiegelglas
Rotem Mulayoff
Stella Graßhof
Sami S. Brandt
T. Michaeli
DiffM
15
39
0
20 Mar 2023
Location-Free Scene Graph Generation
Ege Ozsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
37
4
0
20 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
27
3
0
18 Mar 2023
MRIS: A Multi-modal Retrieval Approach for Image Synthesis on Diverse Modalities
Boqi Chen
Marc Niethammer
28
1
0
17 Mar 2023
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
Can Qin
Ning Yu
Chen Xing
Shu Zhen Zhang
Zeyuan Chen
Stefano Ermon
Yun Fu
Caiming Xiong
Ran Xu
DiffM
30
19
0
17 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffM
VGen
21
52
0
17 Mar 2023
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection
Arushi Rai
Adriana Kovashka
19
0
0
16 Mar 2023
P+: Extended Textual Conditioning in Text-to-Image Generation
A. Voynov
Qinghao Chu
Daniel Cohen-Or
Kfir Aberman
VLM
DiffM
29
176
0
16 Mar 2023
ShabbyPages: A Reproducible Document Denoising and Binarization Dataset
Alexander Groleau
Kok Wei Chee
Stefan Larson
Samay Maini
Jonathan Boarman
14
2
0
16 Mar 2023
SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective
Zipeng Xu
Songlong Xing
E. Sangineto
N. Sebe
CLIP
17
2
0
16 Mar 2023
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
Xinyang Liu
Dongsheng Wang
Bowei Fang
Miaoge Li
Zhibin Duan
Yishi Xu
Bo Chen
Mingyuan Zhou
VLM
VPVLM
21
5
0
16 Mar 2023
Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models
D. Kothandaraman
Tianyi Zhou
Ming Lin
Dinesh Manocha
24
5
0
15 Mar 2023
Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey
Huali Xu
Shuaifeng Zhi
Shuzhou Sun
Vishal M. Patel
Li Liu
27
13
0
15 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
45
429
0
14 Mar 2023
Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation
Junyoung Seo
Wooseok Jang
Minseop Kwak
Ines Hyeonsu Kim
Jaehoon Ko
Junho Kim
Jin-Hwa Kim
Jiyoung Lee
Seung Wook Kim
DiffM
30
135
0
14 Mar 2023
WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminant Analysis
Yiye Chen
Yunzhi Lin
Ruinian Xu
Patricio A. Vela
OODD
24
3
0
14 Mar 2023
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Bo He
Jun Wang
Jielin Qiu
Trung Bui
Abhinav Shrivastava
Zhaowen Wang
20
65
0
13 Mar 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip H. S. Torr
31
23
0
11 Mar 2023
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test Questions
He Zhu
Xihua Li
Xuemin Zhao
Yunbo Cao
Shan Yu
10
0
0
09 Mar 2023
Optimizing CAD Models with Latent Space Manipulation
J. Elstner
Raoul Schönhof
Steffen Tauber
Marco F. Huber
30
0
0
09 Mar 2023
Rethinking Visual Prompt Learning as Masked Visual Token Modeling
Ning Liao
Bowen Shi
Xiaopeng Zhang
Min Cao
Junchi Yan
Qi Tian
VLM
26
7
0
09 Mar 2023
Transformer-based Image Generation from Scene Graphs
Renato Sortino
S. Palazzo
C. Spampinato
ViT
43
15
0
08 Mar 2023
Exploring Efficient-Tuned Learning Audio Representation Method from BriVL
Sen Fang
Yang Wu
Bowen Gao
Jingwen Cai
T. Teoh
DiffM
16
1
0
08 Mar 2023
CUDA: Convolution-based Unlearnable Datasets
Vinu Sankar Sadasivan
Mahdi Soltanolkotabi
S. Feizi
MU
29
23
0
07 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
24
501
0
07 Mar 2023
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
Minyoung Hwang
Jaeyeon Jeong
Minsoo Kim
Yoonseon Oh
Songhwai Oh
17
19
0
07 Mar 2023
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]
Maureen Daum
Enhao Zhang
Dong He
Stephen Mussmann
Brandon Haynes
Ranjay Krishna
Magdalena Balazinska
27
4
0
07 Mar 2023
ELODIN: Naming Concepts in Embedding Spaces
Rodrigo Mello
Filipe Calegario
Geber Ramalho
DiffM
18
1
0
07 Mar 2023
Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding
Jiacheng Li
Longhui Wei
Zongyuan Zhan
Xinfu He
Siliang Tang
Qi Tian
Yueting Zhuang
19
4
0
07 Mar 2023
Previous
1
2
3
...
157
158
159
...
177
178
179
Next