ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 8,339 papers shown
Title
Visual Instruction Tuning with Chain of Region-of-Interest
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
21
0
0
11 May 2025
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Roy Betser
Meir Yossef Levi
Guy Gilboa
21
0
0
11 May 2025
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
Lu Dong
H. Zhang
Hongjie Zhang
Y. Huang
Z. Ling
Yu Qiao
Limin Wang
Y. Wang
AI4TS
21
0
0
10 May 2025
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
Wei Yang
Jingjing Fu
R. Wang
Jinyu Wang
Lei Song
Jiang Bian
9
0
0
10 May 2025
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
Xianghao Kong
Qiaosong Qi
Yuanbin Wang
Anyi Rao
Biaolong Chen
Aixi Zhang
Si Liu
Hao Jiang
DiffM
VGen
20
0
0
10 May 2025
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Jingyao Wang
Jianqi Zhang
Wenwen Qiang
Changwen Zheng
VLM
23
0
0
10 May 2025
Jailbreaking the Text-to-Video Generative Models
Jailbreaking the Text-to-Video Generative Models
Jiayang Liu
Siyuan Liang
Shiqian Zhao
Rongcheng Tu
Wenbo Zhou
Xiaochun Cao
D. Tao
Siew Kei Lam
EGVM
VGen
39
0
0
10 May 2025
Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws
Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws
Xiyuan Wei
Ming Lin
Fanjiang Ye
Fengguang Song
Liangliang Cao
My T. Thai
Tianbao Yang
LLMSV
24
0
0
10 May 2025
METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection
METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection
Yongqi Wang
Xinxiao Wu
Shuo Yang
ObjD
19
0
0
10 May 2025
A Short Overview of Multi-Modal Wi-Fi Sensing
A Short Overview of Multi-Modal Wi-Fi Sensing
Zijian Zhao
24
0
0
10 May 2025
Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers
Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers
Chi Xu
Yili Jin
Sami Ma
Rongsheng Qian
Hao Fang
...
Xue Liu
Edith Ngai
William I. Atlas
Katrina M. Connors
Mark A. Spoljaric
16
0
0
10 May 2025
Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites
Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites
Timothée Anne
Noah Syrkis
Meriem Elhosni
Florian Turati
Franck Legendre
Alain Jaquier
Sebastian Risi
11
0
0
10 May 2025
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
21
0
0
10 May 2025
Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects
Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects
Tobias Preintner
Weixuan Yuan
Qi Huang
Adrian König
Thomas Bäck
E. Raponi
N. V. Stein
19
0
0
09 May 2025
LLM-Land: Large Language Models for Context-Aware Drone Landing
LLM-Land: Large Language Models for Context-Aware Drone Landing
Siwei Cai
Yuwei Wu
Lifeng Zhou
14
0
0
09 May 2025
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
Weihong Li
Xiaoqiong Liu
Heng Fan
L. Zhang
16
0
0
09 May 2025
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
Shuai Wang
Ivona Najdenkoska
Hongyi Zhu
S. Rudinac
Monika Kackovic
N. Wijnberg
M. Worring
30
0
0
09 May 2025
Towards Better Cephalometric Landmark Detection with Diffusion Data Generation
Towards Better Cephalometric Landmark Detection with Diffusion Data Generation
Dongqian Guo
Wencheng Han
Pang Lyu
Yuxi Zhou
Jianbing Shen
MedIm
24
0
0
09 May 2025
Wasserstein Distances Made Explainable: Insights into Dataset Shifts and Transport Phenomena
Wasserstein Distances Made Explainable: Insights into Dataset Shifts and Transport Phenomena
Philip Naumann
Jacob R. Kauffmann
G. Montavon
11
0
0
09 May 2025
Semantic-Space-Intervened Diffusive Alignment for Visual Classification
Semantic-Space-Intervened Diffusive Alignment for Visual Classification
Zixuan Li
Lei Meng
Guoqing Chao
Wei Wu
Xiaoshuo Yan
Yimeng Yang
Zhuang Qi
X. Meng
DiffM
29
0
0
09 May 2025
Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models
Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models
Krti Tallam
11
0
0
09 May 2025
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Karthik Reddy Kanjula
Surya Guthikonda
Nahid Alam
Shayekh Bin Islam
19
0
0
09 May 2025
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Christos Plachouras
Julien Guinot
George Fazekas
Elio Quinton
Emmanouil Benetos
Johan Pauwels
38
1
0
09 May 2025
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
V. Bhat
Yu-Hsiang Lan
P. Krishnamurthy
Ramesh Karri
Farshad Khorrami
43
0
0
09 May 2025
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Congqi Cao
Peiheng Han
Y. Zhang
Yating Yu
Qinyi Lv
Lingtong Min
Yanning Zhang
VLM
30
0
0
09 May 2025
Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation
Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation
Dongying Li
Binyi Su
Hua Zhang
Yong Li
Haiyong Chen
44
0
0
09 May 2025
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Alexander Lappe
M. Giese
19
0
0
09 May 2025
Describe Anything in Medical Images
Describe Anything in Medical Images
Xi Xiao
Yunbei Zhang
Thanh-Huy Nguyen
Ba Thinh Lam
Janet Wang
...
Xingjian Li
X. U. Wang
Hao Xu
Tianming Liu
Min Xu
MedIm
VLM
35
0
0
09 May 2025
Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization
Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization
Xi Yang
Songsong Duan
Nannan Wang
Xinbo Gao
WSOL
68
0
0
08 May 2025
Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know
Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know
Shireen Kudukkil Manchingal
Fabio Cuzzolin
42
0
0
08 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
42
0
0
08 May 2025
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
Henry Zheng
Hao Shi
Qihang Peng
Yong Xien Chng
Rui Huang
Yepeng Weng
Zhongchao Shi
Gao Huang
59
1
0
08 May 2025
Generating Physically Stable and Buildable LEGO Designs from Text
Generating Physically Stable and Buildable LEGO Designs from Text
Ava Pun
Kangle Deng
Ruixuan Liu
Deva Ramanan
Changliu Liu
Jun-Yan Zhu
56
0
0
08 May 2025
OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
Sifan Song
Siyeop Yoon
Pengfei Jin
Sekeun Kim
Matthew Tivnan
...
Zhiliang Lyu
Dufan Wu
Ning Guo
Xiang Li
Quanzheng Li
OOD
ViT
54
0
0
08 May 2025
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
Biao Yi
Xavier Hu
Y. Chen
Shengyu Zhang
Hongxia Yang
Fan Wu
Fei Wu
LLMAG
74
0
0
08 May 2025
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
43
0
0
08 May 2025
Does CLIP perceive art the same way we do?
Does CLIP perceive art the same way we do?
Andrea Asperti
Leonardo Dessì
Maria Chiara Tonetti
Nico Wu
46
0
0
08 May 2025
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
Yonwoo Choi
3DGS
VGen
60
0
0
08 May 2025
ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
Wanjiang Weng
Xiaofeng Tan
Hongsong Wang
Pan Zhou
VGen
44
0
0
08 May 2025
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Tong Wang
Ting Liu
Xiaochao Qu
Chengjing Wu
Luoqi Liu
Xiaolin Hu
DiffM
51
0
0
08 May 2025
Visual Affordances: Enabling Robots to Understand Object Functionality
Visual Affordances: Enabling Robots to Understand Object Functionality
Tommaso Apicella
Alessio Xompero
Andrea Cavallaro
39
0
0
08 May 2025
Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models
Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models
Aarti Ghatkesar
Uddeshya Upadhyay
Ganesh Venkatesh
VLM
31
0
0
08 May 2025
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Aishwarya Venkataramanan
P. Bodesheim
Joachim Denzler
BDL
VLM
62
0
0
08 May 2025
Concept-Based Unsupervised Domain Adaptation
Concept-Based Unsupervised Domain Adaptation
Xinyue Xu
Y. Hu
Hui Tang
Yi Qin
Lu Mi
Hao Wang
Xiaomeng Li
43
0
0
08 May 2025
PADriver: Towards Personalized Autonomous Driving
PADriver: Towards Personalized Autonomous Driving
Genghua Kou
Fan Jia
Weixin Mao
Y. Liu
Yucheng Zhao
Ziheng Zhang
Osamu Yoshie
Tiancai Wang
Y. Li
X. Zhang
44
0
0
08 May 2025
Learning to Drive Anywhere with Model-Based Reannotation
Learning to Drive Anywhere with Model-Based Reannotation
Noriaki Hirose
Lydia Ignatova
Kyle Stachowicz
Catherine Glossop
Sergey Levine
Dhruv Shah
19
0
0
08 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
34
0
0
08 May 2025
Split Matching for Inductive Zero-shot Semantic Segmentation
Split Matching for Inductive Zero-shot Semantic Segmentation
Jialei Chen
Xu Zheng
Dongyue Li
Chong Yi
Seigo Ito
D. Paudel
Luc Van Gool
Hiroshi Murase
Daisuke Deguchi
VLM
50
0
0
08 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
64
0
0
08 May 2025
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
Enhao Zhang
Chaohua Li
Chuanxing Geng
Songcan Chen
52
0
0
08 May 2025
Previous
12345...165166167
Next