Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.07594
Cited By
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
10 November 2023
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model"
10 / 10 papers shown
Title
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
61
0
0
26 Feb 2025
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
367
0
07 Nov 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
107
221
0
18 Sep 2023
MindAgent: Emergent Gaming Interaction
Ran Gong
Qiuyuan Huang
Xiaojian Ma
Hoi Vo
Zane Durante
...
Zilong Zheng
Song-Chun Zhu
Demetri Terzopoulos
Fei-Fei Li
Jianfeng Gao
LM&Ro
96
61
0
18 Sep 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
89
51
0
22 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
96
81
0
04 May 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
154
576
0
06 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
161
401
0
10 Sep 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
1