Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2302.10035
Cited By
v1
v2
v3 (latest)
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Machine Intelligence Research (MIR), 2023
20 February 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (286★)
Papers citing
"Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey"
50 / 127 papers shown
An Empirical Study of Mamba-based Pedestrian Attribute Recognition
Tianlin Li
Weizhe Kong
Jiandong Jin
Shiao Wang
Ruichong Gao
Qingchuan Ma
Chenglong Li
Jin Tang
Mamba
263
10
0
15 Jul 2024
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets
Zhihua Jin
Shiyi Liu
Haotian Li
Xun Zhao
Huamin Qu
250
6
0
03 Jul 2024
Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition
Lan Chen
Dong Li
Xiao Wang
Pengpeng Shao
Wei Zhang
Yaowei Wang
Yonghong Tian
Jin Tang
252
3
0
27 Jun 2024
InFiConD: Interactive No-code Fine-tuning with Concept-based Knowledge Distillation
Jinbin Huang
Wenbin He
Liang Gou
Liu Ren
Chris Bryan
358
0
0
25 Jun 2024
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Byungsoo Jeon
Yingcheng Wang
Shiyi Cao
Sunghyun Kim
Sunghyun Park
...
Xupeng Miao
Mohammad Alizadeh
G. R. Ganger
Tianqi Chen
Zhihao Jia
GNN
AI4CE
212
16
0
24 Jun 2024
Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models
Tianyi Men
Pengfei Cao
Zhuoran Jin
Yubo Chen
Kang Liu
Jun Zhao
LLMAG
AIFin
236
15
0
23 Jun 2024
Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning
International Conference on Artificial Neural Networks (ICANN), 2024
Xiaowen Sun
Xufeng Zhao
Jae Hee Lee
Wenhao Lu
Matthias Kerzel
Stefan Wermter
LM&Ro
221
4
0
14 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLM
MLLM
623
848
0
31 May 2024
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Vasu Sharma
Eugenio Culurciello
321
27
0
28 May 2024
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei
Di Hu
298
60
0
28 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
885
166
0
23 May 2024
Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition
Tianlin Li
Qian Zhu
Jiandong Jin
Jun Zhu
Futian Wang
Bowei Jiang
Yaowei Wang
Yonghong Tian
ViT
281
9
0
27 Apr 2024
Pre-training on High Definition X-ray Images: An Experimental Study
Tianlin Li
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bowei Jiang
Chuanfu Li
Jin Tang
MedIm
ViT
LM&MA
269
6
0
27 Apr 2024
Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models
Yuyan Shi
Jialu Ma
Jin Yang
Shasha Wang
Yichi Zhang
MedIm
VLM
278
4
0
20 Apr 2024
State Space Model for New-Generation Network Alternative to Transformers: A Survey
Tianlin Li
Shiao Wang
Yuhe Ding
Yuehang Li
Wentao Wu
...
Bowei Jiang
Chenglong Li
Yaowei Wang
Yonghong Tian
Jin Tang
Mamba
405
80
0
15 Apr 2024
Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions
IEEE Reviews in Biomedical Engineering (RBME), 2024
Yuting He
Fuxiang Huang
Xinrui Jiang
Yuxiang Nie
Minghao Wang
Jiguang Wang
Hao Chen
LM&MA
AI4CE
363
91
0
04 Apr 2024
Continual Learning for Smart City: A Survey
Li Yang
Zhipeng Luo
Shi-sheng Zhang
Fei Teng
Tian-Jie Li
HAI
266
17
0
01 Apr 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
237
36
0
30 Mar 2024
Generative Multi-modal Models are Good Class-Incremental Learners
Xusheng Cao
Haori Lu
Linlan Huang
Xialei Liu
Ming-Ming Cheng
CLL
311
26
0
27 Mar 2024
LSKNet: A Foundation Lightweight Backbone for Remote Sensing
International Journal of Computer Vision (IJCV), 2024
Yuxuan Li
Xiang Li
Yimain Dai
Qibin Hou
Tianpeng Liu
Yongxiang Liu
Ming-Ming Cheng
Jian Yang
346
103
0
18 Mar 2024
Continual Forgetting for Pre-trained Vision Models
Computer Vision and Pattern Recognition (CVPR), 2024
Hongbo Zhao
Bolin Ni
Haochen Wang
Junsong Fan
Fei Zhu
Yuxi Wang
Yuntao Chen
Gaofeng Meng
Zhaoxiang Zhang
MU
VLM
325
19
0
18 Mar 2024
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Kangning Yin
Shihao Zou
Yuxuan Ge
Zheng Tian
220
14
0
01 Mar 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
491
105
0
27 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Vasu Sharma
VLM
542
67
0
20 Feb 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
Thilo Hagendorff
253
82
0
13 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
248
24
0
02 Feb 2024
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
Anke Tang
Li Shen
Yong Luo
Nan Yin
Lefei Zhang
Dacheng Tao
MoMe
304
82
0
01 Feb 2024
SimAda: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes
Yiran Song
Qianyu Zhou
Xuequan Lu
Zhiwen Shao
Lizhuang Ma
256
7
0
31 Jan 2024
Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions
Yichi Zhang
Zhenrong Shen
Rushi Jiao
VLM
MedIm
305
257
0
07 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
221
14
0
05 Jan 2024
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
Computer Vision and Pattern Recognition (CVPR), 2024
Yiran Song
Qianyu Zhou
Hefei Ling
Deng-Ping Fan
Xuequan Lu
Lizhuang Ma
VLM
503
20
0
04 Jan 2024
Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges
Xiaoqian Liu
Jianbin Jiao
Junge Zhang
OffRL
LRM
329
2
0
29 Dec 2023
LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation
Ruizhe Zhong
Xingbo Du
Shixiong Kai
Zhentao Tang
Siyuan Xu
Hui-Ling Zhen
Jianye Hao
Qiang Xu
Mingxuan Yuan
Junchi Yan
197
65
0
28 Dec 2023
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Tianlin Li
Yao Rong
Shiao Wang
Yuan Chen
Zhe Wu
Bowei Jiang
Yonghong Tian
Jin Tang
ViT
376
6
0
18 Dec 2023
Pedestrian Attribute Recognition via CLIP based Prompt Vision-Language Fusion
Tianlin Li
Jiandong Jin
Chenglong Li
Jin Tang
Cheng Zhang
Wei Wang
VLM
205
34
0
17 Dec 2023
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base
Zhiwei Zha
Jiaan Wang
Zhixu Li
Xiangru Zhu
Wei Song
Yanghua Xiao
VLM
245
0
0
16 Dec 2023
Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception
AAAI Conference on Artificial Intelligence (AAAI), 2023
Tianlin Li
Wentao Wu
Chenglong Li
Zhicheng Zhao
Zhe Chen
Yukai Shi
Jin Tang
220
7
0
15 Dec 2023
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
European Conference on Computer Vision (ECCV), 2023
Zexiang Liu
Yangguang Li
Youtian Lin
Xin Yu
Sida Peng
Yan-Pei Cao
Xiaojuan Qi
Xiaoshui Huang
Ding Liang
Wanli Ouyang
219
49
0
14 Dec 2023
SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm
Pattern Recognition (Pattern Recogn.), 2023
Jiandong Jin
Tianlin Li
Yin Lin
Chenglong Li
Lili Huang
Aihua Zheng
Jin Tang
AI4TS
185
11
0
04 Dec 2023
Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models
Dong Li
Jiandong Jin
Yuhao Zhang
Yanlin Zhong
Yaoyang Wu
Lan Chen
Tianlin Li
Bin Luo
243
8
0
30 Nov 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
175
4
0
29 Nov 2023
Multimodal Large Language Models: A Survey
BigData Congress [Services Society] (BSS), 2023
Jiayang Wu
Wensheng Gan
Zefeng Chen
Shicheng Wan
Philip S. Yu
231
303
0
22 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
319
17
0
14 Nov 2023
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model
Yongqiang Zhao
Zhenyu Li
Zhi Jin
Feng Zhang
Haiyan Zhao
Chengfeng Dou
Zhengwei Tao
Xinhai Xu
Donghong Liu
193
6
0
31 Oct 2023
VcT: Visual change Transformer for Remote Sensing Image Change Detection
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Bo Jiang
Zitian Wang
Xixi Wang
Ziyan Zhang
Lan Chen
Tianlin Li
Bin Luo
ViT
206
76
0
17 Oct 2023
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Computer Vision and Pattern Recognition (CVPR), 2023
Honghui Yang
Sha Zhang
Di Huang
Xiaoyang Wu
Haoyi Zhu
...
Hengshuang Zhao
Qibo Qiu
Binbin Lin
Xiaofei He
Wanli Ouyang
SSL
295
71
0
12 Oct 2023
Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot Learning
Workshop on Argument Mining (ArgMining), 2023
Arushi Sharma
Abhibha Gupta
Maneesh Bilalpur
182
7
0
11 Oct 2023
SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training
International Conference on Learning Representations (ICLR), 2023
Kazem Meidani
Parshin Shojaee
Chandan K. Reddy
A. Farimani
359
33
0
03 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
IEEE International Conference on Robotics and Automation (ICRA), 2023
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
388
293
0
03 Oct 2023
Natural Language based Context Modeling and Reasoning for Ubiquitous Computing with Large Language Models: A Tutorial
Haoyi Xiong
Jiang Bian
Sijia Yang
Xiaofei Zhang
Linghe Kong
Daqing Zhang
LRM
LLMAG
275
8
0
24 Sep 2023
Previous
1
2
3
Next