v1v2v3 (latest)

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Machine Intelligence Research (MIR), 2023

20 February 2023

Yaowei Wang

Yonghong Tian

Wen Gao

AI4CE

VLM

ArXiv (abs)PDF HTML Github (286★)

Papers citing "Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey"

27 / 127 papers shown

A Survey on Image-text Multimodal Models

Ruifeng Guo

Jingxuan Wei

Linzhuang Sun

Khai-Nguyen Nguyen

Guiyong Chang

Dawei Liu

Sibo Zhang

Zhengbing Yao

Mingjun Xu

Liping Bu

VLM

328

23 Sep 2023

Bias and Fairness in Chatbots: An OverviewAPSIPA Transactions on Signal and Information Processing (TASIP), 2023

322

16 Sep 2023

SSL-Net: A Synergistic Spectral and Learning-based Network for Efficient Bird Sound ClassificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

164

15 Sep 2023

Enhancing Subtask Performance of Multi-modal Large Language Model

31 Aug 2023

GPTEval: A Survey on Assessments of ChatGPT and GPT-4International Conference on Language Resources and Evaluation (LREC), 2023

187

148

24 Aug 2023

Progressive Feature Mining and External Knowledge-Assisted Text-Pedestrian Image RetrievalIEEE transactions on multimedia (IEEE TMM), 2023

225

23 Aug 2023

TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time SeriesInternational Conference on Learning Representations (ICLR), 2023

408

190

16 Aug 2023

CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology PreservationIEEE International Conference on Computer Vision (ICCV), 2023

Xiaodan Liang

139

14 Aug 2023

SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based RecognitionIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2023

376

08 Aug 2023

Improving Zero-Shot Generalization for CLIP with Synthesized Prompts

Jian Liang

255

14 Jul 2023

A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Ming Jin

Irwin King

421

350

07 Jul 2023

Review of Large Vision Models and Visual Prompt Engineering

Chong Ma

...

Tianming Liu

317

216

03 Jul 2023

A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation

Jeremy Gwinnup

Kevin Duh

VLM

148

12 Jun 2023

ProTeCt: Prompt Tuning for Taxonomic Open Set ClassificationComputer Vision and Pattern Recognition (CVPR), 2023

179

04 Jun 2023

AMatFormer: Efficient Feature Matching via Anchor Matching TransformerIEEE transactions on multimedia (IEEE TMM), 2023

178

30 May 2023

A Comprehensive Survey on Segment Anything Model for Vision and Beyond

408

130

14 May 2023

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and RoadmapsReliability Engineering & System Safety (Reliab. Eng. Syst. Saf.), 2023

386

10 May 2023

Exploring the Landscape of Machine Unlearning: A Comprehensive Survey and TaxonomyIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

Haoran Xie

516

10 May 2023

Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey

Yichi Zhang

Rushi Jiao

MedIm VLM

294

05 May 2023

Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestrian Attribute Recognition

294

20 Apr 2023

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural NetworksInternational Conference on Learning Representations (ICLR), 2023

Jun Zhao

504

04 Apr 2023

Vision-Language Models for Vision Tasks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

501

1,044

03 Apr 2023

RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning

Chenglong Li

235

26 Mar 2023

AI-Generated Content (AIGC): A Survey

Wensheng Gan

250

188

26 Mar 2023

Large Selective Kernel Network for Remote Sensing Object DetectionIEEE International Conference on Computer Vision (ICCV), 2023

Jian Yang

Xiang Li

ObjD

328

458

16 Mar 2023

BEVBert: Multimodal Map Pre-training for Language-guided Navigation

283

107

08 Dec 2022

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

317

150

18 Aug 2022