ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 8,869 papers shown
Title
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for
  Inverse Problems
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problems
Giannis Daras
Y. Dagan
A. Dimakis
C. Daskalakis
BDL
26
15
0
18 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
22
130
0
18 Jun 2022
Landscape Learning for Neural Network Inversion
Landscape Learning for Neural Network Inversion
Ruoshi Liu
Chen-Guang Mao
Purva Tendulkar
Hongya Wang
Carl Vondrick
21
8
0
17 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
20
43
0
17 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
45
391
0
17 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
  Knowledge
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
42
347
0
17 Jun 2022
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
  Retrieval
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval
Xiao Dong
Xunlin Zhan
Yunchao Wei
Xiaoyong Wei
Yaowei Wang
Minlong Lu
Xiaochun Cao
Xiaodan Liang
19
11
0
17 Jun 2022
Rectify ViT Shortcut Learning by Visual Saliency
Rectify ViT Shortcut Learning by Visual Saliency
Chong Ma
Lin Zhao
Yuzhong Chen
David Liu
Xi Jiang
Tuo Zhang
Xintao Hu
Dinggang Shen
Dajiang Zhu
Tianming Liu
ViT
20
20
0
17 Jun 2022
Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized
  Images
Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images
Jiyeon Han
Hwanil Choi
Yunjey Choi
Jae Hyun Kim
Jung-Woo Ha
Jaesik Choi
EGVM
10
31
0
17 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
34
226
0
16 Jun 2022
Patch-level Representation Learning for Self-supervised Vision
  Transformers
Patch-level Representation Learning for Self-supervised Vision Transformers
Sukmin Yun
Hankook Lee
Jaehyung Kim
Jinwoo Shin
ViT
16
64
0
16 Jun 2022
Disentangling visual and written concepts in CLIP
Disentangling visual and written concepts in CLIP
Joanna Materzyñska
Antonio Torralba
David Bau
CoGe
12
46
0
15 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
17
123
0
15 Jun 2022
A Meta-Analysis of Distributionally-Robust Models
A Meta-Analysis of Distributionally-Robust Models
Ben Feuer
Ameya Joshi
C. Hegde
OOD
VLM
19
3
0
15 Jun 2022
Forecasting of depth and ego-motion with transformers and
  self-supervision
Forecasting of depth and ego-motion with transformers and self-supervision
Houssem-eddine Boulahbal
A. Voicila
Andrew I. Comport
ViT
MDE
19
3
0
15 Jun 2022
Zero-shot object goal visual navigation
Zero-shot object goal visual navigation
Qianfan Zhao
Lu Zhang
Bin He
Hong Qiao
Zhi-yong Liu
25
37
0
15 Jun 2022
Differentiable Top-k Classification Learning
Differentiable Top-k Classification Learning
Felix Petersen
Hilde Kuehne
Christian Borgelt
Oliver Deussen
43
28
0
15 Jun 2022
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across
  Modalities
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
Hammad A. Ayyubi
Christopher Thomas
Lovish Chum
R. Lokesh
Long Chen
...
Xudong Lin
Xuande Feng
Jaywon Koo
Sounak Ray
Shih-Fu Chang
AI4TS
20
0
0
14 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language
  Modeling
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
18
81
0
14 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and
  Not Yet Learnt
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan N. Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
30
148
0
14 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
25
233
0
14 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
13
87
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
522
0
13 Jun 2022
INDIGO: Intrinsic Multimodality for Domain Generalization
INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla
Shivam Chandhok
Milan Aggarwal
V. Balasubramanian
Balaji Krishnamurthy
VLM
31
2
0
13 Jun 2022
Bootstrapping Multi-view Representations for Fake News Detection
Bootstrapping Multi-view Representations for Fake News Detection
Qichao Ying
Xiaoxiao Hu
Yangming Zhou
Zhenxing Qian
Dan Zeng
Shiming Ge
19
45
0
12 Jun 2022
Seeing the forest and the tree: Building representations of both
  individual and collective dynamics with transformers
Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers
Ran Liu
Mehdi Azabou
M. Dabagia
Jingyun Xiao
Eva L. Dyer
AI4CE
27
19
0
10 Jun 2022
Neural Prompt Search
Neural Prompt Search
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
VPVLM
VLM
22
143
0
09 Jun 2022
Extreme Masking for Learning Instance and Distributed Visual
  Representations
Extreme Masking for Learning Instance and Distributed Visual Representations
Zhirong Wu
Zihang Lai
Xiao Sun
Stephen Lin
30
22
0
09 Jun 2022
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
Zi-Yi Dou
Nanyun Peng
17
22
0
09 Jun 2022
Intra-agent speech permits zero-shot task acquisition
Intra-agent speech permits zero-shot task acquisition
Chen Yan
Federico Carnevale
Petko Georgiev
Adam Santoro
Aurelia Guy
Alistair Muldal
Chia-Chun Hung
Josh Abramson
Timothy Lillicrap
Greg Wayne
LM&Ro
36
9
0
07 Jun 2022
Blended Latent Diffusion
Blended Latent Diffusion
Omri Avrahami
Ohad Fried
Dani Lischinski
DiffM
50
374
0
06 Jun 2022
Volumetric Disentanglement for 3D Scene Manipulation
Volumetric Disentanglement for 3D Scene Manipulation
Sagie Benaim
Frederik Warburg
Peter Ebert Christensen
Serge J. Belongie
17
15
0
06 Jun 2022
APES: Articulated Part Extraction from Sprite Sheets
APES: Articulated Part Extraction from Sprite Sheets
Zhan Xu
Matthew Fisher
Yang Zhou
Deepali Aneja
Rushikesh Dudhat
Li Yi
E. Kalogerakis
21
2
0
04 Jun 2022
Delving into the Openness of CLIP
Delving into the Openness of CLIP
Shuhuai Ren
Lei Li
Xuancheng Ren
Guangxiang Zhao
Xu Sun
VLM
20
13
0
04 Jun 2022
Revisiting the "Video" in Video-Language Understanding
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
11
155
0
03 Jun 2022
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual
  Question Answering
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Yuanze Lin
Yujia Xie
Dongdong Chen
Yichong Xu
Chenguang Zhu
Lu Yuan
38
71
0
02 Jun 2022
Prefix Conditioning Unifies Language and Label Supervision
Prefix Conditioning Unifies Language and Label Supervision
Kuniaki Saito
Kihyuk Sohn
X. Zhang
Chun-Liang Li
Chen-Yu Lee
Kate Saenko
Tomas Pfister
VLM
CLIP
25
16
0
02 Jun 2022
Weakly Supervised Representation Learning with Sparse Perturbations
Weakly Supervised Representation Learning with Sparse Perturbations
Kartik Ahuja
Jason S. Hartford
Yoshua Bengio
SSL
24
58
0
02 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
32
36
0
01 Jun 2022
VALHALLA: Visual Hallucination for Machine Translation
VALHALLA: Visual Hallucination for Machine Translation
Yi Li
Rameswar Panda
Yoon Kim
Chun-Fu Chen
Rogerio Feris
David D. Cox
Nuno Vasconcelos
MLLM
36
38
0
31 May 2022
Post-hoc Concept Bottleneck Models
Post-hoc Concept Bottleneck Models
Mert Yuksekgonul
Maggie Wang
James Y. Zou
133
183
0
31 May 2022
Few-Shot Diffusion Models
Few-Shot Diffusion Models
Giorgio Giannone
Didrik Nielsen
Ole Winther
DiffM
171
49
0
30 May 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
17
13
0
30 May 2022
Prompt-aligned Gradient for Prompt Tuning
Prompt-aligned Gradient for Prompt Tuning
Beier Zhu
Yulei Niu
Yucheng Han
Yuehua Wu
Hanwang Zhang
VLM
175
269
0
30 May 2022
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
Feng Liang
Yangguang Li
Diana Marculescu
SSL
TPM
ViT
40
22
0
28 May 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan A. Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
166
132
0
28 May 2022
Video2StyleGAN: Disentangling Local and Global Variations in a Video
Video2StyleGAN: Disentangling Local and Global Variations in a Video
Rameen Abdal
Peihao Zhu
Niloy J. Mitra
Peter Wonka
VGen
24
7
0
27 May 2022
Prompt-based Learning for Unpaired Image Captioning
Prompt-based Learning for Unpaired Image Captioning
Peipei Zhu
Xiao Wang
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
C. L. P. Chen
VLM
19
31
0
26 May 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
  Spreading Out Disinformation
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
15
12
0
25 May 2022
Mutual Information Divergence: A Unified Metric for Multimodal
  Generative Models
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
19
32
0
25 May 2022
Previous
123...169170171...176177178
Next