Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,342 papers shown
Title
Topology-Aware CLIP Few-Shot Learning
Dazhi Huang
VLM
26
0
0
03 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
52
0
0
03 May 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
Trisanth Srinivasan
Santosh Patapati
27
0
0
03 May 2025
ABE: A Unified Framework for Robust and Faithful Attribution-Based Explainability
Zhiyu Zhu
Jiayu Zhang
Zhibo Jin
Fang Chen
Jianlong Zhou
FAtt
14
0
0
03 May 2025
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Sicong Li
Qianqian Xu
Zhiyong Yang
Zitai Wang
L. Zhang
Xiaochun Cao
Q. Huang
51
0
0
03 May 2025
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Chaomeng Chen
Zitong Yu
J. Dong
Sen Su
L. Shen
Shutao Xia
Xiaochun Cao
FedML
VLM
70
0
0
03 May 2025
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach
Nitin Rai
Arnold W. Schumann
Nathan Boyd
MedIm
37
0
0
03 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
45
0
0
03 May 2025
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Congqi Cao
Lanshu Hu
Yating Yu
Y. Zhang
VLM
61
0
0
03 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
L. Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAML
VLM
49
0
0
02 May 2025
Scalability Matters: Overcoming Challenges in InstructGLM with Similarity-Degree-Based Sampling
Hyun Lee
Chris Yi
Maminur Islam
B.D.S. Aritra
22
0
0
02 May 2025
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Gaozheng Pei
Ke Ma
Yingfei Sun
Qianqian Xu
Q. Huang
DiffM
36
0
0
02 May 2025
Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs
Hari Chandana Kuchibhotla
Sai Srinivas Kancheti
Abbavaram Gowtham Reddy
Vineeth N. Balasubramanian
39
0
0
02 May 2025
PREMISE: Matching-based Prediction for Accurate Review Recommendation
Wei Han
Hui Chen
Soujanya Poria
24
0
0
02 May 2025
VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models
Mohammadreza Teymoorianfard
Shiqing Ma
Amir Houmansadr
WIGM
53
0
0
02 May 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Jenq-Neng Hwang
AI4TS
41
0
0
02 May 2025
VSC: Visual Search Compositional Text-to-Image Diffusion Model
Do Huu Dat
Nam Hyeonu
Po Yuan Mao
Tae-Hyun Oh
DiffM
CoGe
55
0
0
02 May 2025
When Dynamic Data Selection Meets Data Augmentation
S. M. I. Simon X. Yang
Peng Ye
F. Shen
Dongzhan Zhou
22
0
0
02 May 2025
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
Changhe Chen
Quantao Yang
Xiaohao Xu
Nima Fazeli
Olov Andersson
22
0
0
02 May 2025
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
Daniele Molino
Francesco Di Feola
Linlin Shen
Paolo Soda
V. Guarrasi
MedIm
LM&MA
57
0
0
02 May 2025
On the effectiveness of Large Language Models in the mechanical design domain
Daniele Grandi
Fabian Riquelme
17
0
0
02 May 2025
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Gabriel Sarch
Balasaravanan Thoravi Kumaravel
Sahithya Ravi
Vibhav Vineet
A. D. Wilson
58
0
0
02 May 2025
Empowering Agentic Video Analytics Systems with Video Language Models
Yuxuan Yan
Shiqi Jiang
Ting Cao
Y. Yang
Qianqian Yang
Yuanchao Shu
Y. Yang
Lili Qiu
VLM
67
0
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
InstructAttribute: Fine-grained Object Attributes editing with Instruction
Xingxi Yin
Jingfeng Zhang
Zhi Li
Y. Li
Y. Zhang
DiffM
88
0
0
01 May 2025
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Yiming Du
Wenyu Huang
Danna Zheng
Zhaowei Wang
Sébastien Montella
Mirella Lapata
Kam-Fai Wong
Jeff Z. Pan
KELM
MU
71
1
0
01 May 2025
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
Albert Ge
Tzu-Heng Huang
John Cooper
Avi Trost
Ziyi Chu
Satya Sai Srinath Namburi GNVV
Ziyang Cai
Kendall Park
Nicholas Roberts
Frederic Sala
53
0
0
01 May 2025
AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care
Md Asaduzzaman Jabin
Hanqi Jiang
Y. Li
Patrick Kaggwa
Eugene Douglass
Juliet N. Sekandi
Tianming Liu
LM&MA
69
0
0
01 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
57
0
0
01 May 2025
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
Zhijie Qiao
Haowei Li
Zhong Cao
Henry X. Liu
VLM
78
2
0
01 May 2025
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
Lik Hang Kenny Wong
Xueyang Kang
Kaixin Bai
Jianwei Zhang
45
0
0
01 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
60
0
0
01 May 2025
Cues3D: Unleashing the Power of Sole NeRF for Consistent and Unique Instances in Open-Vocabulary 3D Panoptic Segmentation
Feng Xue
Wenzhuang Xu
Guofeng Zhong
Anlong Minga
N. Sebe
65
0
0
01 May 2025
Controllable Weather Synthesis and Removal with Video Diffusion Models
Chih-Hao Lin
Z. Wang
Ruofan Liang
Yuxuan Zhang
Sanja Fidler
Shenlong Wang
Zan Gojcic
DiffM
VGen
42
0
0
01 May 2025
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation
Xuyang Guo
Jiayan Huo
Zhenmei Shi
Zhao-quan Song
Jiahao Zhang
Jiale Zhao
EGVM
VGen
PINN
75
1
0
01 May 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki
Qi Dai
Lee Hyoseok
Chong Luo
Tae-Hyun Oh
59
0
0
01 May 2025
Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization
Kuan Zhang
Chengliang Chai
Jingzhe Xu
Chi Zhang
Ye Yuan
Guoren Wang
Lei Cao
NoLa
51
0
0
01 May 2025
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
74
0
0
30 Apr 2025
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
Andrei-Alexandru Manea
Jindřich Libovický
VLM
52
0
0
30 Apr 2025
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
Qihao Liu
Ju He
Qihang Yu
Liang-Chieh Chen
Alan Yuille
DiffM
VGen
75
0
0
30 Apr 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
81
0
0
30 Apr 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Z. Wang
Tao Jin
DiffM
97
2
0
30 Apr 2025
LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning
Yiyang Shao
Xiaoyu Huang
Bike Zhang
Qiayuan Liao
Yuman Gao
Yufeng Chi
Zhongyu Li
Sophia Shao
K. Sreenath
LM&Ro
62
0
0
30 Apr 2025
Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Early Lung Cancer Detection
Luoting Zhuang
Seyed Mohammad Hossein Tabatabaei
Ramin Salehi-Rad
Linh M. Tran
Denise R. Aberle
Ashley E. Prosper
William Hsu
31
0
0
30 Apr 2025
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
Yunhao Li
Sijing Wu
Wei Sun
Zhichao Zhang
Yucheng Zhu
Zicheng Zhang
Huiyu Duan
Xiongkuo Min
Guangtao Zhai
EGVM
78
0
0
30 Apr 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
76
0
0
30 Apr 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
37
0
0
30 Apr 2025
XeMap: Contextual Referring in Large-Scale Remote Sensing Environments
Y. Li
Lu Si
Y. T. Hou
Chengaung Liu
B. Li
Hongjian Fang
J. Zhang
71
0
0
30 Apr 2025
An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
Modesto Castrillón-Santana
Oliverio J. Santana
David Freire-Obregón
Daniel Hernández-Sosa
J. Lorenzo-Navarro
52
0
0
30 Apr 2025
Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders
Xuwei Yang
Fatemeh Tavakoli
D. B. Emerson
Anastasis Kratsios
FedML
62
0
0
30 Apr 2025
Previous
1
2
3
4
5
6
...
165
166
167
Next