ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 8,265 papers shown
Title
Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World
Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World
Yuran Wang
Yingping Liang
Ying Fu
16
0
0
13 May 2025
DSADF: Thinking Fast and Slow for Decision Making
DSADF: Thinking Fast and Slow for Decision Making
Alex Zhihao Dou
Dongfei Cui
Jun Yan
W. Wang
Benteng Chen
Haoming Wang
Zeke Xie
Shufei Zhang
OffRL
11
0
0
13 May 2025
Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting
Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting
Zheang Huai
Hui Tang
Yi Li
Z. Chen
Xiaomeng Li
VLM
23
0
0
13 May 2025
Decoding Neighborhood Environments with Large Language Models
Decoding Neighborhood Environments with Large Language Models
Andrew Cart
Shaohu Zhang
Melanie Escue
Xugui Zhou
Haitao Zhao
Prashanth BusiReddyGari
Beiyu Lin
Shuang Li
11
0
0
13 May 2025
Visual Image Reconstruction from Brain Activity via Latent Representation
Visual Image Reconstruction from Brain Activity via Latent Representation
Y. Kamitani
Misato Tanaka
Ken Shirakawa
16
0
0
13 May 2025
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
23
0
0
13 May 2025
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
Wenxuan Ma
Xiaoge Cao
Y. Zhang
Chaofan Zhang
Shaobo Yang
Peng Hao
Bin Fang
Yinghao Cai
Shaowei Cui
Shuo Wang
13
0
0
13 May 2025
Controllable Image Colorization with Instance-aware Texts and Masks
Controllable Image Colorization with Instance-aware Texts and Masks
Yanru An
Ling Gui
Qiang Hu
Chunlei Cai
Tianxiao Ye
Xiaoyun Zhang
Yanfeng Wang
DiffM
22
0
0
13 May 2025
Large Language Models for Computer-Aided Design: A Survey
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
24
0
0
13 May 2025
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
Avihai Giuili
Rotem Atari
A. Sintov
VLM
13
0
0
13 May 2025
SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model
SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model
Zhanjie Zhang
Quanwei Zhang
Junsheng Luan
Mengyuan Yang
Yun Wang
Lei Zhao
16
0
0
13 May 2025
Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
Anle Ke
Xu Zhang
Tong Chen
Ming-Tse Lu
Chao Zhou
Jiawen Gu
Zhan Ma
DiffM
18
0
0
13 May 2025
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Zhe Li
Hadrien Reynaud
Bernhard Kainz
DD
30
0
0
13 May 2025
Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities
Decoupled Multimodal Prototypes for Visual Recognition with Missing Modalities
Jueqing Lu
Yuanyuan Qi
Xiaohao Yang
Shujie Zhou
Lan Du
14
0
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
39
0
0
13 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush Rai
Kyle Min
Tarun Krishna
Feiyan Hu
A. Smeaton
Noel E. O'Connor
VGen
14
0
0
13 May 2025
Task-Adaptive Semantic Communications with Controllable Diffusion-based Data Regeneration
Task-Adaptive Semantic Communications with Controllable Diffusion-based Data Regeneration
Fupei Guo
Achintha Wijesinghe
Songyang Zhang
Zhi Ding
DiffM
13
0
0
12 May 2025
Addressing degeneracies in latent interpolation for diffusion models
Addressing degeneracies in latent interpolation for diffusion models
Erik Landolsi
Fredrik Kahl
DiffM
32
0
0
12 May 2025
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie
Zequn Zeng
Hao Zhang
Yucheng Ding
Y. Wang
Zhengjue Wang
Bo Chen
Hongwei Liu
OT
21
0
0
12 May 2025
Incomplete In-context Learning
Incomplete In-context Learning
Wenqiang Wang
Yangshijie Zhang
26
0
0
12 May 2025
Visually Interpretable Subtask Reasoning for Visual Question Answering
Visually Interpretable Subtask Reasoning for Visual Question Answering
Yu Cheng
A. Goel
Hakan Bilen
LRM
21
0
0
12 May 2025
Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection
Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection
Yuqi Cheng
Yunkang Cao
Dongfang Wang
Weiming Shen
Wenlong Li
24
1
0
12 May 2025
No Query, No Access
No Query, No Access
W. Wang
Siyuan Liang
Y. Zhang
X. Jia
Hao Lin
Xiaochun Cao
AAML
14
0
0
12 May 2025
MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
Aybora Koksal
Aydin Alatan
LRM
11
0
0
12 May 2025
DanceGRPO: Unleashing GRPO on Visual Generation
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue
Jie Wu
Yu Gao
Fangyuan Kong
Lingting Zhu
...
Zhiheng Liu
Wei Liu
Qiushan Guo
Weilin Huang
Ping Luo
EGVM
VGen
45
0
0
12 May 2025
UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning
UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning
Oleg Sautenkov
Yasheerah Yaqoot
Muhammad Ahsan Mustafa
Faryal Batool
Jeffrin Sam
Artem Lykov
Chih-Yung Wen
Dzmitry Tsetserukou
21
0
0
12 May 2025
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Weiyu Li
X. Zhang
Zheng Sun
Di Qi
H. Li
...
Zeming Li
Gang Yu
Xiangyu Zhang
Daxin Jiang
Ping Tan
24
0
0
12 May 2025
Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach
Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach
Ruikun Hou
B. Bühler
Tim Fütterer
Efe Bozkir
Peter Gerjets
Ulrich Trautwein
Enkelejda Kasneci
19
0
0
12 May 2025
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
Khurram Mazher
Saad Bin Nasir
MQ
32
0
0
12 May 2025
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Hongyin Zhang
Zifeng Zhuang
H. Zhao
Pengxiang Ding
Hongchao Lu
Donglin Wang
OffRL
34
0
0
12 May 2025
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via D\mathbf{\texttt{D}}Dual-H\mathbf{\texttt{H}}Head O\mathbf{\texttt{O}}Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
35
0
0
12 May 2025
SLAG: Scalable Language-Augmented Gaussian Splatting
SLAG: Scalable Language-Augmented Gaussian Splatting
Laszlo Szilagyi
Francis Engelmann
Jeannette Bohg
3DGS
30
0
0
12 May 2025
Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models
Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models
Songlin Dong
Chenhao Ding
Jiangyang Li
Jizhou Han
Qiang Wang
Yuhang He
Yihong Gong
CLL
VLM
25
0
0
12 May 2025
Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection
Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection
Hongda Qin
Xiao-Qiang Lu
Zhiyong Wei
Yihong Cao
Kailun Yang
Ningjiang Chen
ObjD
MLLM
VLM
21
0
0
12 May 2025
Beyond Static Perception: Integrating Temporal Context into VLMs for Cloth Folding
Beyond Static Perception: Integrating Temporal Context into VLMs for Cloth Folding
Oriol Barbany
Adria Colomé
Carme Torras
21
0
0
12 May 2025
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
M. Ronecker
Matthew Foutter
Amine Elhafsi
Daniele Gammelli
Ihor Barakaiev
Marco Pavone
Daniel Watzenig
16
0
0
12 May 2025
Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks
Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks
Jiafan Li
Jiaqi Zhu
Liang Chang
Yilin Li
Miaomiao Li
Yang Wang
H. Wang
14
0
0
12 May 2025
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
Hongkun Dou
Zeyu Li
Xingyu Jiang
H. Li
Lijun Yang
Wen Yao
Yue Deng
DiffM
30
0
0
12 May 2025
FLUXSynID: A Framework for Identity-Controlled Synthetic Face Generation with Document and Live Images
FLUXSynID: A Framework for Identity-Controlled Synthetic Face Generation with Document and Live Images
Raul Ismayilov
Dzemila Sero
Luuk Spreeuwers
24
0
0
12 May 2025
Hand-Shadow Poser
Hand-Shadow Poser
Hao Xu
Yinqiao Wang
Niloy J. Mitra
Shuaicheng Liu
Pheng-Ann Heng
Chi-Wing Fu
3DH
24
0
0
11 May 2025
A Vision-Language Foundation Model for Leaf Disease Identification
A Vision-Language Foundation Model for Leaf Disease Identification
Khang Nguyen Quoc
Lan Le Thi Thu
Luyl-Da Quach
VLM
16
0
0
11 May 2025
Fine-Grained Bias Exploration and Mitigation for Group-Robust Classification
Fine-Grained Bias Exploration and Mitigation for Group-Robust Classification
Miaoyun Zhao
Qiang Zhang
C. Li
18
0
0
11 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
21
0
0
11 May 2025
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning
L. Yang
W. Zhang
Quan Z. Sheng
Weitong Chen
L. Yao
Weitong Chen
A. Shakeri
21
0
0
11 May 2025
BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation
BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation
Panwen Hu
Jiehui Huang
Qiang Sun
Xiaodan Liang
DiffM
VGen
23
0
0
11 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
C. Hong
AI4CE
21
0
0
11 May 2025
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Roy Betser
Meir Yossef Levi
Guy Gilboa
18
0
0
11 May 2025
Efficient Robotic Policy Learning via Latent Space Backward Planning
Efficient Robotic Policy Learning via Latent Space Backward Planning
Dongxiu Liu
Haoyi Niu
Zhihao Wang
Jinliang Zheng
Yinan Zheng
Zhonghong Ou
Jianming Hu
Jianxiong Li
Xianyuan Zhan
13
0
0
11 May 2025
Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution
Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution
Zihang Liu
Zhenyu Zhang
Hao Tang
24
0
0
11 May 2025
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
Wenju Sun
Qingyong Li
Yangli-ao Geng
Boyang Li
MoMe
21
0
0
11 May 2025
1234...164165166
Next