Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,869 papers shown
Title
Contrastive Object Detection Using Knowledge Graph Embeddings
Christopher Lang
Alexander Braun
Abhinav Valada
8
8
0
21 Dec 2021
Extending CLIP for Category-to-image Retrieval in E-commerce
Mariya Hendriksen
Maurits J. R. Bleeker
Svitlana Vakulenko
N. V. Noord
E. Kuiper
Maarten de Rijke
VLM
11
30
0
21 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
83
14,580
0
20 Dec 2021
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
43
3,459
0
20 Dec 2021
Mind-proofing Your Phone: Navigating the Digital Minefield with GreaseTerminator
Siddhartha Datta
Konrad Kollnig
N. Shadbolt
25
10
0
20 Dec 2021
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
25
16
0
17 Dec 2021
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
S. Hoi
20
191
0
17 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
11
32
0
17 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
32
551
0
16 Dec 2021
CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking
George Zerveas
Navid Rekabsaz
Daniel Cohen
Carsten Eickhoff
20
8
0
16 Dec 2021
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription
Nikolai Vogler
J. Allen
M. Miller
Taylor Berg-Kirkpatrick
18
5
0
16 Dec 2021
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
Shiming Chen
Zi-Quan Hong
Wenjin Hou
Guosen Xie
Yibing Song
Jian-jun Zhao
Xinge You
Shuicheng Yan
Ling Shao
ViT
15
44
0
16 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Mohit Bansal
VLM
VPVLM
27
342
0
13 Dec 2021
SAC-GAN: Structure-Aware Image Composition
Hang Zhou
Rui Ma
Ling-Xiao Zhang
Lina Gao
Ali Mahdavi-Amiri
Haotong Zhang
GAN
27
7
0
13 Dec 2021
PartGlot: Learning Shape Part Segmentation from Language Reference Games
Juil Koo
Ian Huang
Panos Achlioptas
Leonidas J. Guibas
Minhyuk Sung
3DPC
28
28
0
13 Dec 2021
Multimodal Interactions Using Pretrained Unimodal Models for SIMMC 2.0
Joosung Lee
Kijong Han
31
6
0
10 Dec 2021
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
Rameen Abdal
Peihao Zhu
John C. Femiani
Niloy J. Mitra
Peter Wonka
CLIP
25
103
0
09 Dec 2021
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
Can Wang
Menglei Chai
Mingming He
Dongdong Chen
Jing Liao
CLIP
22
377
0
09 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David F. Harwath
James R. Glass
Hilde Kuehne
ViT
23
129
0
08 Dec 2021
A Generic Approach for Enhancing GANs by Regularized Latent Optimization
Yufan Zhou
Chunyuan Li
Changyou Chen
Jinhui Xu
25
0
0
07 Dec 2021
Text2Mesh: Text-Driven Neural Stylization for Meshes
O. Michel
Roi Bar-On
Richard Liu
Sagie Benaim
Rana Hanocka
CLIP
AI4CE
182
350
0
06 Dec 2021
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples
Nir Zabari
Yedid Hoshen
VLM
17
26
0
06 Dec 2021
Embedding Arithmetic of Multimodal Queries for Image Retrieval
Guillaume Couairon
Matthieu Cord
Matthijs Douze
Holger Schwenk
27
23
0
06 Dec 2021
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification
Zhenhailong Wang
Heng Ji
84
71
0
05 Dec 2021
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Longtian Qiu
Renrui Zhang
Ziyu Guo
Wei Zhang
Zilu Guo
Ziyao Zeng
Guangnan Zhang
VLM
CLIP
20
45
0
04 Dec 2021
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Yichun Shi
Xiao Yang
Yangyue Wan
Xiaohui Shen
GAN
140
83
0
04 Dec 2021
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Xingchao Liu
Chengyue Gong
Lemeng Wu
Shujian Zhang
Haoran Su
Qiang Liu
CLIP
23
89
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
36
128
0
02 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLM
CLIP
61
550
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
25
23
0
02 Dec 2021
Editing a classifier by rewriting its prediction rules
Shibani Santurkar
Dimitris Tsipras
Mahalaxmi Elango
David Bau
Antonio Torralba
A. Madry
KELM
175
89
0
02 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
19
7
0
01 Dec 2021
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Michael Niemeyer
Jonathan T. Barron
B. Mildenhall
Mehdi S. M. Sajjadi
Andreas Geiger
Noha Radwan
46
577
0
01 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViT
VLM
17
79
0
01 Dec 2021
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Mattia Soldan
Alejandro Pardo
Juan Carlos León Alcázar
Fabian Caba Heilbron
Chen Zhao
Silvio Giancola
Bernard Ghanem
VGen
32
95
0
01 Dec 2021
CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon
Jong Chul Ye
VLM
CLIP
11
240
0
01 Dec 2021
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
Sahar Abdelnabi
Rakibul Hasan
Mario Fritz
18
74
0
30 Nov 2021
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data
Samarth Mishra
Rameswar Panda
Cheng Perng Phoo
Chun-Fu Chen
Leonid Karlinsky
Kate Saenko
Venkatesh Saligrama
Rogerio Feris
19
33
0
30 Nov 2021
HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing
Yuval Alaluf
Omer Tov
Ron Mokady
Rinon Gal
Amit H. Bermano
28
263
0
30 Nov 2021
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
16
43
0
30 Nov 2021
CRIS: CLIP-Driven Referring Image Segmentation
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
36
358
0
30 Nov 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
22
20
0
30 Nov 2021
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu
Dong Chen
Jianmin Bao
Fang Wen
Bo Zhang
Dongdong Chen
Lu Yuan
B. Guo
DiffM
45
756
0
29 Nov 2021
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Xumin Yu
Lulu Tang
Yongming Rao
Tiejun Huang
Jie Zhou
Jiwen Lu
3DPC
20
651
0
29 Nov 2021
Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami
Dani Lischinski
Ohad Fried
DiffM
11
919
0
29 Nov 2021
Classification-Regression for Chart Comprehension
Matan Levy
Rami Ben-Ari
Dani Lischinski
23
15
0
29 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
30
192
0
29 Nov 2021
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Yufan Zhou
Ruiyi Zhang
Changyou Chen
Chunyuan Li
Chris Tensmeyer
Tong Yu
Jiuxiang Gu
Jinhui Xu
Tong Sun
VLM
21
162
0
27 Nov 2021
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
Changyao Tian
Wenhai Wang
Xizhou Zhu
Jifeng Dai
Yu Qiao
VLM
30
69
0
26 Nov 2021
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
Zipeng Xu
Tianwei Lin
Hao Tang
Fu Li
Dongliang He
N. Sebe
Radu Timofte
Luc Van Gool
Errui Ding
EGVM
25
41
0
26 Nov 2021
Previous
1
2
3
...
174
175
176
177
178
Next