ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 8,339 papers shown
Title
Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models
Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models
Wei Peng
Kang Liu
Jianchen Hu
Meng Zhang
VLM
LM&MA
45
0
0
08 May 2025
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar
Gayatri S Deshmukh
Yalcin Tur
Ulas Bagci
MedIm
51
0
0
08 May 2025
Learning to Drive Anywhere with Model-Based Reannotation
Learning to Drive Anywhere with Model-Based Reannotation
Noriaki Hirose
Lydia Ignatova
Kyle Stachowicz
Catherine Glossop
Sergey Levine
Dhruv Shah
19
0
0
08 May 2025
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
T. Kaiser
Thomas Norrenbrock
Bodo Rosenhahn
42
0
0
08 May 2025
Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models
Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models
Aarti Ghatkesar
Uddeshya Upadhyay
Ganesh Venkatesh
VLM
31
0
0
08 May 2025
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
Biao Yi
Xavier Hu
Y. Chen
Shengyu Zhang
Hongxia Yang
Fan Wu
Fei Wu
LLMAG
74
0
0
08 May 2025
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLM
LRM
40
0
0
08 May 2025
X-Driver: Explainable Autonomous Driving with Vision-Language Models
X-Driver: Explainable Autonomous Driving with Vision-Language Models
Wei Liu
J. A. Zhang
Binxiong Zheng
Yufeng Hu
Yingzhan Lin
Zengfeng Zeng
VLM
LRM
56
0
0
08 May 2025
Does CLIP perceive art the same way we do?
Does CLIP perceive art the same way we do?
Andrea Asperti
Leonardo Dessì
Maria Chiara Tonetti
Nico Wu
46
0
0
08 May 2025
Generating Physically Stable and Buildable LEGO Designs from Text
Generating Physically Stable and Buildable LEGO Designs from Text
Ava Pun
Kangle Deng
Ruixuan Liu
Deva Ramanan
Changliu Liu
Jun-Yan Zhu
56
0
0
08 May 2025
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
F. Khan
Jun Chen
Youssef Mohamed
Chun-Mei Feng
Mohamed Elhoseiny
VLM
16
0
0
08 May 2025
PADriver: Towards Personalized Autonomous Driving
PADriver: Towards Personalized Autonomous Driving
Genghua Kou
Fan Jia
Weixin Mao
Y. Liu
Yucheng Zhao
Ziheng Zhang
Osamu Yoshie
Tiancai Wang
Y. Li
X. Zhang
44
0
0
08 May 2025
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li
Weijian Ma
Xueyang Li
Yunzhong Lou
G. Zhou
Xiangdong Zhou
32
0
0
07 May 2025
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $α$-$β$-Divergence
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via ααα-βββ-Divergence
Guanghui Wang
Zhiyong Yang
Z. Wang
Shi Wang
Qianqian Xu
Q. Huang
37
0
0
07 May 2025
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li
Y. Liu
Haoqin Tu
Hongru Zhu
Cihang Xie
VLM
52
0
0
07 May 2025
TetWeave: Isosurface Extraction using On-The-Fly Delaunay Tetrahedral Grids for Gradient-Based Mesh Optimization
TetWeave: Isosurface Extraction using On-The-Fly Delaunay Tetrahedral Grids for Gradient-Based Mesh Optimization
Alexandre Binninger
Ruben Wiersma
Philipp Herholz
O. Sorkine-Hornung
42
0
0
07 May 2025
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava
Xiang Zhang
He Wen
Chenru Wen
Zhuowen Tu
DiffM
26
0
0
07 May 2025
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
Feng Xiao
Hongbin Xu
Guocan Zhao
Wenxiong Kang
41
0
0
07 May 2025
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu
Daniel Fu
Noah Tan
Itai Lang
Rana Hanocka
3DH
43
0
0
07 May 2025
Object-Shot Enhanced Grounding Network for Egocentric Video
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng
Haoyu Zhang
Meng Liu
Weili Guan
Liqiang Nie
36
0
0
07 May 2025
DMRL: Data- and Model-aware Reward Learning for Data Extraction
DMRL: Data- and Model-aware Reward Learning for Data Extraction
Zhiqiang Wang
Ruoxi Cheng
21
0
0
07 May 2025
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin
Zhu Xu
Yang Liu
14
0
0
07 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Y. Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
Componential Prompt-Knowledge Alignment for Domain Incremental Learning
Componential Prompt-Knowledge Alignment for Domain Incremental Learning
Kunlun Xu
Xu Zou
Gang Hua
Jiahuan Zhou
CLL
76
0
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
50
0
0
07 May 2025
Multi-turn Consistent Image Editing
Multi-turn Consistent Image Editing
Zijun Zhou
Yingying Deng
Xiangyu He
Weiming Dong
Fan Tang
46
0
0
07 May 2025
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
Zilong Chen
Yikai Wang
Wenqiang Sun
Feng Wang
Yiwen Chen
Huaping Liu
27
0
0
07 May 2025
Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision
Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision
Linhan Cao
Wei Sun
Kaiwei Zhang
Yicong Peng
Guangtao Zhai
Xiongkuo Min
47
0
0
06 May 2025
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
Sameer Malik
Moyuru Yamada
Ayush Singh
Dishank Aggarwal
56
0
0
06 May 2025
Task Reconstruction and Extrapolation for $π_0$ using Text Latent
Task Reconstruction and Extrapolation for π0π_0π0​ using Text Latent
Quanyi Li
28
0
0
06 May 2025
DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor
DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor
Wei-Ting Chen
Yu-Jiet Vong
Yi-Tsung Lee
Sy-Yen Kuo
Qiang Gao
Sizhuo Ma
Jian Wang
76
0
0
06 May 2025
ChannelExplorer: Exploring Class Separability Through Activation Channel Visualization
ChannelExplorer: Exploring Class Separability Through Activation Channel Visualization
Md Rahat-uz- Zaman
Bei Wang
Paul Rosen
17
0
0
06 May 2025
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
L. Wang
Senmao Li
Fei Yang
Jianye Wang
Ziheng Zhang
Y. Liu
Y. Wang
Jian Yang
DiffM
52
0
0
06 May 2025
EOPose : Exemplar-based object reposing using Generalized Pose Correspondences
EOPose : Exemplar-based object reposing using Generalized Pose Correspondences
Sarthak Mehrotra
Rishabh Jain
Mayur Hemani
Balaji Krishnamurthy
Mausoom Sarkar
41
0
0
06 May 2025
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
Shiyi Zhang
Junhao Zhuang
Zhaoyang Zhang
Ying Shan
Yansong Tang
VGen
70
0
0
06 May 2025
Distribution-Conditional Generation: From Class Distribution to Creative Generation
Distribution-Conditional Generation: From Class Distribution to Creative Generation
Fu Feng
Yucheng Xie
Xu Yang
Jing Wang
Xin Geng
DiffM
29
0
0
06 May 2025
Safer Prompts: Reducing IP Risk in Visual Generative AI
Safer Prompts: Reducing IP Risk in Visual Generative AI
Lena Reissinger
Yuanyuan Li
Anna-Carolina Haensch
Neeraj Sarna
23
0
0
06 May 2025
Interpretable Zero-shot Learning with Infinite Class Concepts
Interpretable Zero-shot Learning with Infinite Class Concepts
Zihan Ye
Shreyank N Gowda
Shiming Chen
Yaochu Jin
Kaizhu Huang
Xiaobo Jin
VLM
33
0
0
06 May 2025
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon
Federico Girella
Ziyue Liu
Marco Cristani
Yiming Wang
VLM
44
0
0
06 May 2025
1$^{st}$ Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge
1st^{st}st Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge
Junwei Xu
Zehao Zhao
Xiaoyu Hu
Zhenjie Song
25
0
0
06 May 2025
ALMA: Aggregated Lipschitz Maximization Attack on Auto-encoders
ALMA: Aggregated Lipschitz Maximization Attack on Auto-encoders
Chethan Krishnamurthy Ramanaik
Arjun Roy
Eirini Ntoutsi
AAML
20
0
0
06 May 2025
Enhancing Target-unspecific Tasks through a Features Matrix
Enhancing Target-unspecific Tasks through a Features Matrix
Fangming Cui
Yonggang Zhang
Xuan Wang
Xinmei Tian
Jun Yu
AAML
33
0
0
06 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
32
0
0
06 May 2025
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
Liam Boyle
Nicolas Baumann
Paviththiren Sivasothilingam
Michele Magno
Luca Benini
LM&Ro
LRM
37
0
0
06 May 2025
Panoramic Out-of-Distribution Segmentation
Panoramic Out-of-Distribution Segmentation
Mengfei Duan
Kailun Yang
Y. Zhang
Yihong Cao
Fei Teng
Kai Luo
Jiaming Zhang
Zhiyong Li
Shutao Li
50
0
0
06 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
62
0
0
06 May 2025
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
Haoyu Bai
Jie Wang
Gaomin Li
X. Li
Xiaohu Zhang
Xia Yang
25
0
0
06 May 2025
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
François Role
Sébastien Meyer
Victor Amblard
VLM
43
0
0
06 May 2025
Artificial Behavior Intelligence: Technology, Challenges, and Future Directions
Artificial Behavior Intelligence: Technology, Challenges, and Future Directions
Kanghyun Jo
Jehwan Choi
Kwanho Kim
Seongmin Kim
Duy-Linh Nguyen
Xuan-Thuy Vo
Adri Priadana
Tien-Dat Tran
AI4CE
32
0
0
06 May 2025
A Vision-Language Model for Focal Liver Lesion Classification
A Vision-Language Model for Focal Liver Lesion Classification
Song Jian
Hu Yuchang
Wang Hui
Chen Yen-Wei
VLM
MedIm
36
0
0
06 May 2025
Previous
123456...165166167
Next