Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.05519
Cited By
NExT-GPT: Any-to-Any Multimodal LLM
11 September 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"NExT-GPT: Any-to-Any Multimodal LLM"
36 / 336 papers shown
Title
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
191
576
0
16 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
28
12
0
14 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
32
217
0
14 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
28
263
0
14 Nov 2023
Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model
Minh-Hao Van
Xintao Wu
VLM
MLLM
17
7
0
12 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
66
3
0
10 Nov 2023
AI-native Interconnect Framework for Integration of Large Language Model Technologies in 6G Systems
Sasu Tarkoma
Roberto Morabito
Jaakko Sauvola
8
19
0
10 Nov 2023
Hallucination-minimized Data-to-answer Framework for Financial Decision-makers
Sohini Roychowdhury
Andres Alvarez
Brian Moore
Marko Krema
Maria Paz Gelpi
...
Angel Rodriguez
Jose Ramon Cabrejas
Pablo Martinez Serrano
Punit Agrawal
Arijit Mukherjee
19
8
0
09 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
24
12
0
09 Nov 2023
Chain of Images for Intuitively Reasoning
Fanxu Meng
Haotong Yang
Yiding Wang
Muhan Zhang
LRM
22
6
0
09 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
37
3
0
08 Nov 2023
An Interdisciplinary Outlook on Large Language Models for Scientific Research
James Boyko
Joseph Cohen
Nathan Fox
Maria Han Veiga
Jennifer I-Hsiu Li
...
Andreas H. Rauch
Kenneth N. Reid
Soumi Tribedi
Anastasia Visheratina
Xin Xie
34
17
0
03 Nov 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
11
21
0
01 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
21
53
0
31 Oct 2023
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models
Eren Unlu
Unver Ciftci
28
0
0
27 Oct 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu
Zeqiang Lai
Zhangwei Gao
Erfei Cui
Ziheng Li
...
Lewei Lu
Qifeng Chen
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
126
30
0
26 Oct 2023
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
11
2
0
25 Oct 2023
A Survey on Video Diffusion Models
Zhen Xing
Qijun Feng
Haoran Chen
Qi Dai
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVM
VGen
50
112
0
16 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
31
116
0
16 Oct 2023
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs
Sumedh Rasal
Sanjay K. Boddhu
16
5
0
15 Oct 2023
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Zeqiang Lai
Xizhou Zhu
Jifeng Dai
Yu Qiao
Wenhai Wang
MLLM
DiffM
35
22
0
11 Oct 2023
Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback
Jacob Whitehill
Jennifer LoCasale-Crouch
20
6
0
02 Oct 2023
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
24
19
0
28 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
27
36
0
24 Aug 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
33
551
0
23 Jun 2023
Multi-modal Latent Diffusion
Mustapha Bounoua
Giulio Franzese
Pietro Michiardi
DiffM
14
12
0
07 Jun 2023
LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation
Suhyeon Lee
Won Jun Kim
Jinho Chang
Jong Chul Ye
MedIm
29
46
0
19 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
203
883
0
27 Apr 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
140
304
0
30 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
235
556
0
29 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
ViT
VLM
175
494
0
22 Feb 2022
Creativity and Machine Learning: A Survey
Giorgio Franceschelli
Mirco Musolesi
VLM
AI4CE
19
37
0
06 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Previous
1
2
3
4
5
6
7