Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2302.10035
Cited By
v1
v2
v3 (latest)
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Machine Intelligence Research (MIR), 2023
20 February 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (286★)
Papers citing
"Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey"
50 / 127 papers shown
Title
Towards a Foundation Model for Partial Differential Equations Across Physics Domains
Eduardo Soares
E. V. Brazil
Victor Shirasuna
Breno W. S. R. de Carvalho
Cristiano Malossi
AI4CE
89
0
0
26 Nov 2025
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Kedi Chen
Zhikai Lei
Xu Guo
Xuecheng Wu
Siyuan Zeng
...
J. Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
AIMat
AI4TS
LRM
283
0
0
16 Oct 2025
GeoArena: An Open Platform for Benchmarking Large Vision-language Models on WorldWide Image Geolocalization
Pengyue Jia
Yingyi Zhang
Xiangyu Zhao
Yixuan Li
188
1
0
04 Sep 2025
DOMR: Establishing Cross-View Segmentation via Dense Object Matching
Jitong Liao
Yulu Gao
Shaofei Huang
Jialin Gao
Jie Lei
Ronghua Liang
Si Liu
173
1
0
06 Aug 2025
R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation
Futian Wang
Yuhan Qiao
Xiao Wang
Fuling Wang
Yuxiang Zhang
Dengdi Sun
MedIm
101
1
0
05 Aug 2025
Revisiting Heat Flux Analysis of Tungsten Monoblock Divertor on EAST using Physics-Informed Neural Network
Xiao Wang
Zikang Yan
Hao Si
Zhendong Yang
Qingquan Yang
Dengdi Sun
Wanli Lyu
Jin Tang
98
0
0
05 Aug 2025
HGTS-Former: Hierarchical HyperGraph Transformer for Multivariate Time Series Analysis
Xiao Wang
Hao Si
Fan Zhang
Xiaoya Zhou
Dengdi Sun
Wanli Lyu
Qingquan Yang
Jin Tang
AI4TS
354
1
0
04 Aug 2025
When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework
Xiao Wang
Qian Zhu
Shujuan Wu
Bo Jiang
Shiliang Zhang
173
0
0
18 Jul 2025
DCR: Quantifying Data Contamination in LLMs Evaluation
Cheng Xu
Nan Yan
Shuhao Guan
Changhong Jin
Yuke Mei
Yibing Guo
Mohand-Tahar Kechadi
161
1
0
15 Jul 2025
ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model
Jialong Zuo
Yongtai Deng
Mengdan Tan
Rui Jin
Dongyue Wu
Nong Sang
Liang Pan
Changxin Gao
207
0
0
11 Jun 2025
Latent Structured Hopfield Network for Semantic Association and Retrieval
Chong Li
Xiangyang Xue
Jianfeng Feng
Taiping Zeng
BDL
CLL
127
0
0
02 Jun 2025
Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach
Shiao Wang
Xiao Wang
Liye Jin
Bo Jiang
Lin Zhu
Lan Chen
Yonghong Tian
Bin Luo
242
1
0
19 May 2025
Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
396
1
0
12 May 2025
Multi-agent Embodied AI: Advances and Future Directions
Zhaohan Feng
Ruiqi Xue
Lei Yuan
Yang Yu
Ning Ding
M. Liu
Bingzhao Gao
Jian Sun
Xinhu Zheng
Gang Wang
AI4CE
492
24
0
08 May 2025
A Visual RAG Pipeline for Few-Shot Fine-Grained Product Classification
Bianca Lamm
J. Keuper
VLM
AI4TS
319
1
0
16 Apr 2025
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Computer Vision and Pattern Recognition (CVPR), 2025
Lijun Sheng
Jian Liang
Liang Luo
Ran He
AAML
VLM
356
12
0
15 Apr 2025
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
Computer Vision and Pattern Recognition (CVPR), 2025
Yanbo Wang
Jiyang Guan
Jian Liang
Ran He
345
3
0
14 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Wenbo Guo
Tianneng Shi
Yu Yang
Andy Zhang
Patrick Gage Kelley
Kurt Thomas
Kurt Thomas
Dawn Song
487
12
0
07 Apr 2025
MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Pingyu Wu
Daiheng Gao
Jing Tang
Huimin Chen
Wenbo Zhou
Weinan Zhang
Nenghai Yu
199
1
0
17 Mar 2025
SAM2 for Image and Video Segmentation: A Comprehensive Survey
Zhang Jiaxing
Tang Hao
VLM
318
13
0
17 Mar 2025
Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences
Kedi Chen
Zhikai Lei
Fan Zhang
Yinqi Zhang
Qin Chen
Jie Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
ELM
LRM
227
4
0
17 Mar 2025
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
Hao Zhou
Xiaobao Guo
Yuzhe Zhu
A. Kong
DiffM
418
1
0
13 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
412
5
0
12 Mar 2025
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Computer Vision and Pattern Recognition (CVPR), 2025
Guanyao Wu
Haoyu Liu
Hongming Fu
Yichuan Peng
Jinyuan Liu
Xin-Yue Fan
Risheng Liu
370
23
0
03 Mar 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Computer Vision and Pattern Recognition (CVPR), 2025
Peijie Wang
Zhong-Zhi Li
Fei Yin
Xin Yang
Dekang Ran
Cheng-Lin Liu
LRM
558
26
0
28 Feb 2025
Mixtraining: A Better Trade-Off Between Compute and Performance
Zexin Li
Jiancheng Zhang
Yufei Li
Yinglun Zhu
Cong Liu
227
1
0
26 Feb 2025
XiHeFusion: Harnessing Large Language Models for Science Communication in Nuclear Fusion
Xinyu Wang
Qingquan Yang
Fuling Wang
Qiang Chen
Wentao Wu
...
Wanli Lv
Meiwen Chen
Zehua Chen
Guosheng Xu
Jin Tang
AI4CE
241
1
0
08 Feb 2025
Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu
Ana-Cristina Rogoz
Mihai-Sorin Stupariu
Radu Tudor Ionescu
374
3
0
08 Feb 2025
Mask-informed Deep Contrastive Incomplete Multi-view Clustering
Zhenglai Li
Yuqi Shi
Xiao He
Chang-Fu Tang
327
0
0
04 Feb 2025
Towards Visual Grounding: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
927
27
0
28 Dec 2024
Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting
Linhai Zhuo
Zheng Wang
Yuqian Fu
Tianwen Qian
VLM
365
6
0
01 Dec 2024
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Computer Vision and Pattern Recognition (CVPR), 2024
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Yong Liu
...
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
OffRL
LRM
547
27
0
27 Nov 2024
On the ERM Principle in Meta-Learning
Yannay Alon
Steve Hanneke
Shay Moran
Uri Shalit
CLL
LRM
257
2
0
26 Nov 2024
Optimized Vessel Segmentation: A Structure-Agnostic Approach with Small Vessel Enhancement and Morphological Correction
Dongning Song
Weijian Huang
Jiarun Liu
Md Jahidul Islam
Hao Yang
Shanshan Wang
287
1
0
22 Nov 2024
MMGenBench: Fully Automatically Evaluating LMMs from the Text-to-Image Generation Perspective
Hailang Huang
Yong Wang
Zixuan Huang
Huaqiu Li
Tongwen Huang
Xiangxiang Chu
Richong Zhang
MLLM
LM&MA
EGVM
289
1
0
21 Nov 2024
Multimodal large language model for wheat breeding: a new exploration of smart breeding
Isprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2024
Guofeng Yang
Yu Li
Yong He
Zhenjiang Zhou
Lingzhen Ye
Hui Fang
Yiqi Luo
Xuping Feng
210
5
0
20 Nov 2024
Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation
Neural Information Processing Systems (NeurIPS), 2024
Yu-Liang Zhan
Zhong-Yi Lu
Hao Sun
Ze-Feng Gao
238
2
0
10 Nov 2024
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
Li Shen
Anke Tang
Enneng Yang
G. Guo
Yong Luo
Lefei Zhang
Xiaochun Cao
Di Lin
Dacheng Tao
MoMe
195
16
0
29 Oct 2024
MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report
Samrajya Thapa
Koushik Howlader
Subhankar Bhattacharjee
Wei le
MedIm
315
4
0
21 Oct 2024
SNN-PAR: Energy Efficient Pedestrian Attribute Recognition via Spiking Neural Networks
Haiyang Wang
Qian Zhu
Mowen She
Yabo Li
Haoyu Song
Minghe Xu
Xiao Wang
ViT
170
1
0
10 Oct 2024
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Hang Li
Yangqiu Song
...
Kun Wang
Hui Xiong
Philip S. Yu
Xuming Hu
Qingsong Wen
LRM
236
33
0
06 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Z. Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
333
11
0
02 Oct 2024
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Computer Vision and Pattern Recognition (CVPR), 2024
Xiao Wang
Fuling Wang
Yuehang Li
Qingchuan Ma
Shiao Wang
Bo Jiang
Chuanfu Li
Jin Tang
310
15
0
01 Oct 2024
Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing
De Computis (DC), 2024
Huthaifa I. Ashqar
Ahmed Jaber
Taqwa I. Alhadidi
Mohammed Elhenawy
317
19
0
26 Sep 2024
Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey
Yichi Zhang
Zhenrong Shen
VLM
313
21
0
23 Aug 2024
Segment Anything for Videos: A Systematic Survey
Chunhui Zhang
Yawen Cui
Weilin Lin
Guanjie Huang
Yan Rong
Li Liu
Shiguang Shan
VLM
234
11
0
31 Jul 2024
A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation
ACM Conference on Recommender Systems (RecSys), 2024
Zixuan Yi
I. Ounis
188
14
0
29 Jul 2024
SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification
Heng Fang
Shengyue Huang
Wenhao Tang
Luwen Huangfu
Bo Liu
VLM
171
10
0
25 Jul 2024
SFPrompt: Communication-Efficient Split Federated Fine-Tuning for Large Pre-Trained Models over Resource-Limited Devices
Linxiao Cao
Yifei Zhu
Wei Gong
FedML
182
5
0
24 Jul 2024
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM
Keshav Bimbraw
Ye Wang
Jing Liu
T. Koike-Akino
VLM
MedIm
LM&MA
196
4
0
15 Jul 2024
1
2
3
Next