Self-supervised learning of visual features through embedding images into text topic spaces

24 May 2017

Papers citing "Self-supervised learning of visual features through embedding images into text topic spaces"

26 / 26 papers shown

Title
A Comprehensive Survey of Foundation Models in Medicine Wasif Khan Seowung Leem Kyle B. See Joshua K. Wong Shaoting Zhang R. Fang AI4CE LM&MA VLM 105 18 0 17 Jan 2025
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance Chu Myaet Thwal Ye Lin Tun Minh N. H. Nguyen Eui-nam Huh Choong Seon Hong VLM 74 0 0 05 Dec 2024
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders Sizai Hou Songze Li Duanyi Yao AAML 72 0 0 25 Nov 2024
A Multi-Modal Deep Learning Based Approach for House Price Prediction Md Hasebul Hasan Md Abid Jahan Mohammed Eunus Ali Yuan-Fang Li Timos Sellis 18 0 0 09 Sep 2024
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study Chenguang Wang Ruoxi Jia Xin Liu Dawn Song VLM 29 7 0 15 Mar 2024
Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models Fei Pan Sangryul Jeon Brian Wang Frank Mckenna Stella X. Yu 44 2 0 19 Dec 2023
Text-Only Training for Visual Storytelling Yuechen Wang Wen-gang Zhou Zhenbo Lu Houqiang Li DiffM 28 2 0 17 Aug 2023
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation Yash J. Patel Yusheng Xie Yi Zhu Srikar Appalaraju R. Manmatha 35 4 0 07 Feb 2023
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR Jiatong Shi Chan-Jan Hsu Ho-Lam Chung Dongji Gao Leibny Paola García-Perera Shinji Watanabe Ann Lee Hung-yi Lee 32 12 0 06 Nov 2022
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition Szu-Jui Chen Jiamin Xie John H. L. Hansen 40 8 0 30 Jun 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision Henghui Zhao Isma Hadji Nikita Dvornik Konstantinos G. Derpanis Richard P. Wildes Allan D. Jepson 34 45 0 04 May 2022
Multi-View Transformer for 3D Visual Grounding Shijia Huang Yilun Chen Jiaya Jia Liwei Wang 31 113 0 05 Apr 2022
Extract Free Dense Labels from CLIP Chong Zhou Chen Change Loy Bo Dai VLM CLIP 54 455 0 02 Dec 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm Yangguang Li Feng Liang Lichen Zhao Yufeng Cui Wanli Ouyang Jing Shao F. Yu Junjie Yan VLM CLIP 50 446 0 11 Oct 2021
Learning to Prompt for Vision-Language Models Kaiyang Zhou Jingkang Yang Chen Change Loy Ziwei Liu VPVLM CLIP VLM 348 2,279 0 02 Sep 2021
Exploring Visual Engagement Signals for Representation Learning Menglin Jia Zuxuan Wu A. Reiter Claire Cardie Serge Belongie Ser-Nam Lim 21 13 0 15 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 183 27,846 0 26 Feb 2021
Hard Negative Mixing for Contrastive Learning Yannis Kalantidis Mert Bulent Sariyildiz Noé Pion Philippe Weinzaepfel Diane Larlus SSL 53 628 0 02 Oct 2020
Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning Subhojeet Pramanik Shashank Mujumdar Hima Patel 19 31 0 30 Sep 2020
A Web Page Classifier Library Based on Random Image Content Analysis Using Deep Learning L. E. Leal Kaj-Mikael Björk A. Lendasse Anton Akusok 9 9 0 18 Dec 2019
Multi-task Self-Supervised Learning for Human Activity Detection Aaqib Saeed T. Ozcelebi J. Lukkien SSL 23 269 0 27 Jul 2019
Scene Text Visual Question Answering Ali Furkan Biten Rubèn Pérez Tito Andrés Mafla Lluís Gómez Marçal Rusiñol Ernest Valveny C. V. Jawahar Dimosthenis Karatzas 36 343 0 31 May 2019
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text Ruotong Wang R. Hwa Adriana Kovashka 24 54 0 21 Jul 2018
Learning Type-Aware Embeddings for Fashion Compatibility Mariya I. Vasileva Bryan A. Plummer Krishna Dusad Shreya Rajpal Ranjitha Kumar David A. Forsyth 34 225 0 25 Mar 2018
Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video Aljosa Osep P. Voigtlaender Jonathon Luiten Stefan Breuers Bastian Leibe ObjD 45 11 0 23 Dec 2017
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics Yunchao Gong Qifa Ke Michael Isard Svetlana Lazebnik 3DV 76 584 0 18 Dec 2012