ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.01392
  4. Cited By
Learning Visual Representations with Caption Annotations

Learning Visual Representations with Caption Annotations

4 August 2020
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
    VLM
    SSL
ArXivPDFHTML

Papers citing "Learning Visual Representations with Caption Annotations"

38 / 38 papers shown
Title
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
A new approach for encoding code and assisting code understanding
A new approach for encoding code and assisting code understanding
Mengdan Fan
Changde Du
Haiyan Zhao
Zhi Jin
41
0
0
01 Aug 2024
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
Antonín Vobecký
Oriane Siméoni
David Hurych
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Josef Sivic
29
33
0
17 Jan 2024
From Text to Pixels: A Context-Aware Semantic Synergy Solution for
  Infrared and Visible Image Fusion
From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion
Xingyuan Li
Yang Zou
Jinyuan Liu
Zhiying Jiang
Long Ma
Xin-Yue Fan
Risheng Liu
27
4
0
31 Dec 2023
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for
  Remote Sensing
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang
R. Prabha
Tianyuan Huang
Jiajun Wu
Ram Rajagopal
29
53
0
20 Dec 2023
Zero-shot Building Attribute Extraction from Large-Scale Vision and
  Language Models
Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models
Fei Pan
Sangryul Jeon
Brian Wang
Frank Mckenna
Stella X. Yu
36
2
0
19 Dec 2023
RECLIP: Resource-efficient CLIP by Training with Small Images
RECLIP: Resource-efficient CLIP by Training with Small Images
Runze Li
Dahun Kim
B. Bhanu
Weicheng Kuo
VLM
CLIP
22
12
0
12 Apr 2023
CUDA: Convolution-based Unlearnable Datasets
CUDA: Convolution-based Unlearnable Datasets
Vinu Sankar Sadasivan
Mahdi Soltanolkotabi
S. Feizi
MU
29
23
0
07 Mar 2023
Actional Atomic-Concept Learning for Demystifying Vision-Language
  Navigation
Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation
Bingqian Lin
Yi Zhu
Xiaodan Liang
Liang Lin
Jian-zhuo Liu
CoGe
LM&Ro
29
3
0
13 Feb 2023
Advancing Radiograph Representation Learning with Masked Record Modeling
Advancing Radiograph Representation Learning with Masked Record Modeling
Hong-Yu Zhou
Chenyu Lian
Lian-cheng Wang
Yizhou Yu
MedIm
20
55
0
30 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
35
11
0
17 Jan 2023
EXIF as Language: Learning Cross-Modal Associations Between Images and
  Camera Metadata
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata
Chenhao Zheng
Ayush Shrivastava
Andrew Owens
VLM
28
11
0
11 Jan 2023
The Role of Local Alignment and Uniformity in Image-Text Contrastive
  Learning on Medical Images
The Role of Local Alignment and Uniformity in Image-Text Contrastive Learning on Medical Images
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
11
7
0
14 Nov 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang
Zhenbang Wu
Dinesh Agarwal
Jimeng Sun
CLIP
VLM
MedIm
29
394
0
18 Oct 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in
  E-commerce
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
14
16
0
01 Jul 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
17
123
0
15 Jun 2022
INDIGO: Intrinsic Multimodality for Domain Generalization
INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla
Shivam Chandhok
Milan Aggarwal
V. Balasubramanian
Balaji Krishnamurthy
VLM
33
2
0
13 Jun 2022
Data Determines Distributional Robustness in Contrastive Language Image
  Pre-training (CLIP)
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)
Alex Fang
Gabriel Ilharco
Mitchell Wortsman
Yu Wan
Vaishaal Shankar
Achal Dave
Ludwig Schmidt
VLM
OOD
20
138
0
03 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
12
16
0
02 May 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
67
6,622
0
13 Apr 2022
SLIP: Self-supervision meets Language-Image Pre-training
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David A. Wagner
Saining Xie
VLM
CLIP
32
475
0
23 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
32
551
0
16 Dec 2021
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
Rameen Abdal
Peihao Zhu
John C. Femiani
Niloy J. Mitra
Peter Wonka
CLIP
28
103
0
09 Dec 2021
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation
  Examples
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples
Nir Zabari
Yedid Hoshen
VLM
20
26
0
06 Dec 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does
  Matter
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
22
20
0
30 Nov 2021
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Aditya Sanghi
Hang Chu
Joseph G. Lambourne
Ye Wang
Chin-Yi Cheng
Marco Fumero
Kamal Rahimi Malekshan
CLIP
33
289
0
06 Oct 2021
Dense Contrastive Visual-Linguistic Pretraining
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLM
SSL
52
10
0
24 Sep 2021
Robust fine-tuning of zero-shot models
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
21
687
0
04 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
188
405
0
13 Jul 2021
Data-Efficient Language-Supervised Zero-Shot Learning with
  Self-Distillation
Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
Rui Cheng
Bichen Wu
Peizhao Zhang
Peter Vajda
Joseph E. Gonzalez
CLIP
VLM
8
31
0
18 Apr 2021
Exploring Visual Engagement Signals for Representation Learning
Exploring Visual Engagement Signals for Representation Learning
Menglin Jia
Zuxuan Wu
A. Reiter
Claire Cardie
Serge J. Belongie
Ser-Nam Lim
19
60
0
15 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language
  Representation Learning
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
32
271
0
07 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
86
27,569
0
26 Feb 2021
Concept Generalization in Visual Representation Learning
Concept Generalization in Visual Representation Learning
Mert Bulent Sariyildiz
Yannis Kalantidis
Diane Larlus
Alahari Karteek
SSL
13
50
0
10 Dec 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
14
432
0
11 Jun 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
926
0
24 Sep 2019
Self-Supervised Feature Learning by Learning to Spot Artifacts
Self-Supervised Feature Learning by Learning to Spot Artifacts
Simon Jenni
Paolo Favaro
SSL
137
127
0
13 Jun 2018
1