ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.10936
  4. Cited By
A Survey of Vision-Language Pre-Trained Models

A Survey of Vision-Language Pre-Trained Models

18 February 2022
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
    VLM
ArXivPDFHTML

Papers citing "A Survey of Vision-Language Pre-Trained Models"

50 / 124 papers shown
Title
Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A
  Pilot Study on MedCLIP
Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A Pilot Study on MedCLIP
Ruinan Jin
Chun-Yin Huang
Chenyu You
Xiaoxiao Li
AAML
MedIm
25
2
0
01 Jan 2024
Adapting Large Language Models for Education: Foundational Capabilities,
  Potentials, and Challenges
Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges
Qingyao Li
Lingyue Fu
Weiming Zhang
Xianyu Chen
Jingwei Yu
Wei Xia
Weinan Zhang
Ruiming Tang
Yong Yu
AI4Ed
ELM
27
17
0
27 Dec 2023
Toward General-Purpose Robots via Foundation Models: A Survey and
  Meta-Analysis
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Yafei Hu
Quanting Xie
Vidhi Jain
Jonathan M Francis
Jay Patrikar
...
Xiaolong Wang
Sebastian A. Scherer
Z. Kira
Fei Xia
Yonatan Bisk
LM&Ro
AI4CE
30
62
0
14 Dec 2023
An Empirical Study of Automated Mislabel Detection in Real World Vision
  Datasets
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets
Maya Srikanth
Jeremy Irvin
Brian Wesley Hill
Felipe Godoy
Ishan Sabane
Andrew Y. Ng
26
2
0
02 Dec 2023
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for
  Vision LLMs
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu
Chenhang Cui
Zijun Wang
Yiyang Zhou
Bingchen Zhao
Junlin Han
Wangchunshu Zhou
Huaxiu Yao
Cihang Xie
MLLM
45
70
0
27 Nov 2023
Large Pre-trained time series models for cross-domain Time series
  analysis tasks
Large Pre-trained time series models for cross-domain Time series analysis tasks
Harshavardhan Kamarthi
B. A. Prakash
VLM
AI4TS
15
10
0
19 Nov 2023
PEMS: Pre-trained Epidemic Time-series Models
PEMS: Pre-trained Epidemic Time-series Models
Harshavardhan Kamarthi
B. A. Prakash
AI4TS
19
2
0
14 Nov 2023
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in
  Video-Language Models
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
.Ilker Kesen
Andrea Pedrotti
Mustafa Dogan
Michele Cafagna
Emre Can Acikgoz
...
Iacer Calixto
Anette Frank
Albert Gatt
Aykut Erdem
Erkut Erdem
33
15
0
13 Nov 2023
Prototypical Contrastive Learning-based CLIP Fine-tuning for Object
  Re-identification
Prototypical Contrastive Learning-based CLIP Fine-tuning for Object Re-identification
Jiachen Li
Xiaojin Gong
VLM
14
3
0
26 Oct 2023
3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for
  Embodied Turn-Taking Prediction
3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking Prediction
Mehdi Fatan
Emanuele Mincato
Dimitra Pintzou
Mariella Dimiccoli
16
1
0
23 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
  Pre-trained Models
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
28
36
0
07 Oct 2023
Natural Language based Context Modeling and Reasoning for Ubiquitous
  Computing with Large Language Models: A Tutorial
Natural Language based Context Modeling and Reasoning for Ubiquitous Computing with Large Language Models: A Tutorial
Haoyi Xiong
Jiang Bian
Sijia Yang
Xiaofei Zhang
Linghe Kong
Daqing Zhang
LRM
LLMAG
30
5
0
24 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
21
5
0
23 Sep 2023
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight
  Inheritance
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
Kan Wu
Houwen Peng
Zhenghong Zhou
Bin Xiao
Mengchen Liu
...
Xi
Xi Chen
Xinggang Wang
Hongyang Chao
Han Hu
VLM
OODD
21
53
0
21 Sep 2023
SayCanPay: Heuristic Planning with Large Language Models using Learnable
  Domain Knowledge
SayCanPay: Heuristic Planning with Large Language Models using Learnable Domain Knowledge
Rishi Hazra
Pedro Zuidberg Dos Martires
Luc de Raedt
LM&Ro
LLMAG
13
31
0
24 Aug 2023
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text
  Retrieval
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
Yuan. Yuan
Yangfan Zhan
Zhitong Xiong
VLM
23
39
0
24 Aug 2023
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal
  Heterogeneous Federated Learning
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen
Yao Zhang
Denis Krompass
Jindong Gu
Volker Tresp
FedML
65
39
0
21 Aug 2023
Generic Attention-model Explainability by Weighted Relevance
  Accumulation
Generic Attention-model Explainability by Weighted Relevance Accumulation
Yiming Huang
Ao Jia
Xiaodan Zhang
Jiawei Zhang
18
1
0
20 Aug 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by
  Connecting Foundation Models
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
19
37
0
18 Aug 2023
EventBind: Learning a Unified Representation to Bind Them All for
  Event-based Open-world Understanding
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
17
12
0
06 Aug 2023
Causal reasoning in typical computer vision tasks
Causal reasoning in typical computer vision tasks
Kexuan Zhang
Qiyu Sun
Chaoqiang Zhao
Yang Tang
CML
24
11
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
117
0
25 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion
  Recognition
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
23
35
0
20 Jul 2023
Prototypical Contrastive Transfer Learning for Multimodal Language
  Understanding
Prototypical Contrastive Transfer Learning for Multimodal Language Understanding
Seitaro Otsuki
Shintaro Ishikawa
K. Sugiura
25
1
0
12 Jul 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
  Vision-Language Model for Remote Sensing
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Jianwei Yin
DiffM
VLM
21
52
0
20 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
66
188
0
19 Jun 2023
CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via
  Adversarial Latent Search
CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search
Fahad Shamshad
Muzammal Naseer
Karthik Nandakumar
AAML
PICV
28
27
0
16 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast
  Mapping in Vision-Language Models
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
Ziqiao Ma
Jiayi Pan
J. Chai
ObjD
VLM
21
8
0
14 Jun 2023
A Survey of Vision-Language Pre-training from the Lens of Multimodal
  Machine Translation
A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation
Jeremy Gwinnup
Kevin Duh
VLM
12
3
0
12 Jun 2023
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
Tz-Ying Wu
Chih-Hui Ho
Nuno Vasconcelos
VLM
8
5
0
04 Jun 2023
Table and Image Generation for Investigating Knowledge of Entities in
  Pre-trained Vision and Language Models
Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
VLM
13
1
0
03 Jun 2023
Benchmarking Robustness of Adaptation Methods on Pre-trained
  Vision-Language Models
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Shuo Chen
Jindong Gu
Zhen Han
Yunpu Ma
Philip H. S. Torr
Volker Tresp
VPVLM
VLM
27
17
0
03 Jun 2023
An Overview on Generative AI at Scale with Edge-Cloud Computing
An Overview on Generative AI at Scale with Edge-Cloud Computing
Yun Cheng Wang
Jintang Xue
Chengwei Wei
C.-C. Jay Kuo
19
30
0
02 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLM
VLM
29
2
0
01 Jun 2023
UniChart: A Universal Vision-language Pretrained Model for Chart
  Comprehension and Reasoning
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq R. Joty
LRM
19
100
0
24 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
52
691
0
17 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
32
89
0
14 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health
  Management: A Survey and Roadmaps
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
19
45
0
10 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating
  Multi-Modalities as Foreign Languages
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
27
115
0
07 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
19
10
0
03 May 2023
POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained
  models
POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models
Korawat Tanwisuth
Shujian Zhang
Huangjie Zheng
Pengcheng He
Mingyuan Zhou
VLM
VPVLM
81
28
0
29 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
39
474
0
03 Apr 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
23
30
0
21 Mar 2023
LIMITR: Leveraging Local Information for Medical Image-Text
  Representation
LIMITR: Leveraging Local Information for Medical Image-Text Representation
Gefen Dawidowicz
Elad Hirsch
A. Tal
23
15
0
21 Mar 2023
Language Model Behavior: A Comprehensive Survey
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
27
102
0
20 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
14
2
0
09 Mar 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
13
1
0
05 Mar 2023
Deep Learning for Video-Text Retrieval: a Review
Deep Learning for Video-Text Retrieval: a Review
Cunjuan Zhu
Qi Jia
Wei-Neng Chen
Yanming Guo
Yu Liu
19
14
0
24 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
24
199
0
20 Feb 2023
Understanding Multimodal Contrastive Learning and Incorporating Unpaired
  Data
Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data
Ryumei Nakada
Halil Ibrahim Gulluk
Zhun Deng
Wenlong Ji
James Y. Zou
Linjun Zhang
SSL
VLM
37
34
0
13 Feb 2023
Previous
123
Next