ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.05729
  4. Cited By
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

15 January 2022
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
    CLIP
    VLM
ArXivPDFHTML

Papers citing "CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks"

30 / 30 papers shown
Title
Crossmodal Knowledge Distillation with WordNet-Relaxed Text Embeddings for Robust Image Classification
Crossmodal Knowledge Distillation with WordNet-Relaxed Text Embeddings for Robust Image Classification
Chenqi Guo
Mengshuo Rong
Qianli Feng
Rongfan Feng
Yinglong Ma
VLM
52
0
0
31 Mar 2025
MedCoT: Medical Chain of Thought via Hierarchical Expert
MedCoT: Medical Chain of Thought via Hierarchical Expert
Jiaxiang Liu
Yuan Wang
Jiawei Du
Joey Tianyi Zhou
Zuozhu Liu
LRM
70
9
0
18 Dec 2024
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Xupeng Chen
Zhixin Lai
Kangrui Ruan
Shichu Chen
Jiaxiang Liu
Zuozhu Liu
33
1
0
27 Oct 2024
Cognition Transferring and Decoupling for Text-supervised Egocentric
  Semantic Segmentation
Cognition Transferring and Decoupling for Text-supervised Egocentric Semantic Segmentation
Zhaofeng Shi
Heqian Qiu
Lanxiao Wang
Fanman Meng
Q. Wu
Hongliang Li
21
2
0
02 Oct 2024
Cascade Prompt Learning for Vision-Language Model Adaptation
Cascade Prompt Learning for Vision-Language Model Adaptation
Ge Wu
Xin Zhang
Zheng Li
Zhaowei Chen
Jiajun Liang
Jian Yang
Xiang Li
VLM
19
6
0
26 Sep 2024
Revisiting Prompt Pretraining of Vision-Language Models
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLM
VPVLM
VLM
33
1
0
10 Sep 2024
LLAVADI: What Matters For Multimodal Large Language Models Distillation
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu
Xiangtai Li
Haobo Yuan
Lu Qi
Yunhai Tong
Ming-Hsuan Yang
34
3
0
28 Jul 2024
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using
  Embeddings as Teachers
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers
Lakshmi Nair
VLM
21
0
0
09 Apr 2024
Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model
  via Cross-modal Alignment
Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment
Angelos Zavras
Dimitrios Michail
Begum Demir
Ioannis Papoutsis
VLM
17
11
0
15 Feb 2024
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced
  Training
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
11
43
0
28 Nov 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
  Beyond
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
12
3
0
23 Oct 2023
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text
  Recognition
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition
Zixiao Wang
Hongtao Xie
Yuxin Wang
Jianjun Xu
Boqiang Zhang
Yongdong Zhang
26
15
0
08 Oct 2023
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight
  Inheritance
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
Kan Wu
Houwen Peng
Zhenghong Zhou
Bin Xiao
Mengchen Liu
...
Xi
Xi Chen
Xinggang Wang
Hongyang Chao
Han Hu
VLM
OODD
15
51
0
21 Sep 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual
  Augmentation
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang
Z. Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
12
8
0
31 Aug 2023
Composed Image Retrieval using Contrastive Learning and Task-oriented
  CLIP-based Features
Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
Alberto Baldrati
Marco Bertini
Tiberio Uricchio
A. Bimbo
CLIP
CoGe
9
28
0
22 Aug 2023
Distilling Large Vision-Language Model with Out-of-Distribution
  Generalizability
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Xuanlin Li
Yunhao Fang
Minghua Liu
Z. Ling
Z. Tu
Haoran Su
VLM
28
21
0
06 Jul 2023
Safeguarding Data in Multimodal AI: A Differentially Private Approach to
  CLIP Training
Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training
Alyssa Huang
Peihan Liu
Ryumei Nakada
Linjun Zhang
Wanrong Zhang
VLM
26
5
0
13 Jun 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
8
19
0
31 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
33
46
0
21 Mar 2023
Architext: Language-Driven Generative Architecture Design
Architext: Language-Driven Generative Architecture Design
Theodoros Galanos
Antonios Liapis
Georgios N. Yannakakis
VLM
AI4CE
26
6
0
13 Mar 2023
Understanding Multimodal Contrastive Learning and Incorporating Unpaired
  Data
Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data
Ryumei Nakada
Halil Ibrahim Gulluk
Zhun Deng
Wenlong Ji
James Y. Zou
Linjun Zhang
SSL
VLM
37
25
0
13 Feb 2023
Attentive Mask CLIP
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
32
26
0
16 Dec 2022
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
  Models
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
Manli Shu
Weili Nie
De-An Huang
Zhiding Yu
Tom Goldstein
Anima Anandkumar
Chaowei Xiao
VLM
VPVLM
172
278
0
15 Sep 2022
Open-Vocabulary Universal Image Segmentation with MaskCLIP
Open-Vocabulary Universal Image Segmentation with MaskCLIP
Zheng Ding
Jieke Wang
Z. Tu
CLIP
ISeg
VLM
30
85
0
18 Aug 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
518
0
13 Jun 2022
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
  Cross-Modal Prompting
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting
G. Han
Long Chen
Jiawei Ma
Shiyuan Huang
Ramalingam Chellappa
Shih-Fu Chang
VLM
13
20
0
16 Apr 2022
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Changdae Oh
Junhyuk So
Hoyoon Byun
Yongtaek Lim
Minchul Shin
Jong-June Jeon
Kyungwoo Song
21
24
0
08 Mar 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
185
403
0
13 Jul 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
1