ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.06386
  4. Cited By
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge
  Distillation

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

12 March 2022
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
    VLM
ArXivPDFHTML

Papers citing "Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation"

20 / 20 papers shown
Title
Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
Rahul Garg
Trilok Padhi
Hemang Jain
Ugur Kursuncu
Ponnurangam Kumaraguru
72
2
0
19 Nov 2024
BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation
BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation
Haechan Mark Bong
Ricardo de Azambuja
Giovanni Beltrame
VLM
26
0
0
16 Oct 2024
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD
  Generalization
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Yuhang Zang
Hanlin Goh
Josh Susskind
Chen Huang
VLM
24
12
0
29 Jan 2024
Improving Zero-shot Visual Question Answering via Large Language Models
  with Reasoning Question Prompts
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
Yunshi Lan
Xiang Li
Xin Liu
Yang Li
Wei Qin
Weining Qian
LRM
ReLM
23
23
0
15 Nov 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
22
1
0
15 Oct 2023
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs
Sumedh Rasal
Sanjay K. Boddhu
19
5
0
15 Oct 2023
VideoAdviser: Video Knowledge Distillation for Multimodal Transfer
  Learning
VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning
Yanan Wang
Donghuo Zeng
Shinya Wada
Satoshi Kurihara
27
6
0
27 Sep 2023
Tackling VQA with Pretrained Foundation Models without Further Training
Tackling VQA with Pretrained Foundation Models without Further Training
Alvin De Jun Tan
Bingquan Shen
MLLM
13
1
0
27 Sep 2023
Modularized Zero-shot VQA with Pre-trained Models
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
19
2
0
27 May 2023
Actional Atomic-Concept Learning for Demystifying Vision-Language
  Navigation
Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation
Bingqian Lin
Yi Zhu
Xiaodan Liang
Liang Lin
Jian-zhuo Liu
CoGe
LM&Ro
29
3
0
13 Feb 2023
Gradient Knowledge Distillation for Pre-trained Language Models
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
8
5
0
02 Nov 2022
Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car
  Commands
Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Wenliang Dai
Samuel Cahyawijaya
Tiezheng Yu
Elham J. Barezi
Pascale Fung
11
1
0
06 Jul 2022
Survey of Hallucination in Natural Language Generation
Survey of Hallucination in Natural Language Generation
Ziwei Ji
Nayeon Lee
Rita Frieske
Tiezheng Yu
D. Su
...
Delong Chen
Wenliang Dai
Ho Shu Chan
Andrea Madotto
Pascale Fung
HILM
LRM
25
2,212
0
08 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,010
0
28 Jan 2022
MURAL: Multimodal, Multitask Retrieval Across Languages
MURAL: Multimodal, Multitask Retrieval Across Languages
Aashi Jain
Mandy Guo
Krishna Srinivasan
Ting-Li Chen
Sneha Kudugunta
Chao Jia
Yinfei Yang
Jason Baldridge
VLM
112
52
0
10 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
185
403
0
13 Jul 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
518
0
04 Feb 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1