Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.06395
Cited By
ModaVerse: Efficiently Transforming Modalities with LLMs
12 January 2024
Xinyu Wang
Bohan Zhuang
Qi Wu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ModaVerse: Efficiently Transforming Modalities with LLMs"
15 / 15 papers shown
Title
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
Chengkai Huang
Hongtao Huang
Tong Yu
Kaige Xie
Junda Wu
Shuai Zhang
Julian McAuley
Dietmar Jannach
Lina Yao
LRM
AI4CE
22
0
0
23 Apr 2025
Are Large Vision Language Models Good Game Players?
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
ELM
LRM
94
3
0
04 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
77
3
0
26 Feb 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
H. Zhang
Tat-Seng Chua
Shuicheng Yan
59
35
0
31 Dec 2024
AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models
Yutong Zhou
Masahiro Ryo
64
0
0
30 Nov 2024
A Survey of Foundation Models for Music Understanding
Wenjun Li
Ying Cai
Ziyang Wu
Wenyi Zhang
Yifan Chen
...
Junwei Han
Bao Ge
Tianming Liu
Lin Gan
Tuo Zhang
48
2
0
15 Sep 2024
Affective Computing Has Changed: The Foundation Model Disruption
Björn Schuller
Adria Mallol-Ragolta
Alejandro Pena Almansa
Iosif Tsangko
Mostafa M. Amin
A. Semertzidou
Lukas Christ
Shahin Amiriparian
28
0
0
13 Sep 2024
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
42
30
0
02 Aug 2024
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Aman Chadha
Eugenio Culurciello
41
13
0
28 May 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
157
280
0
14 Oct 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
33
551
0
23 Jun 2023
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
Zhengxiong Luo
Dayou Chen
Yingya Zhang
Yan Huang
Liangsheng Wang
Yujun Shen
Deli Zhao
Jinren Zhou
Tien-Ping Tan
DiffM
VGen
132
215
0
15 Mar 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
140
304
0
30 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
243
556
0
29 May 2022
1