Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.05437
Cited By
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
9 November 2023
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
Tianhe Ren
Xueyan Zou
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents"
33 / 83 papers shown
Title
In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation
Han Xue
Qianru Sun
Li-Na Song
Wenjun Zhang
Zhiwu Huang
MLLM
36
0
0
15 Apr 2024
OmniFusion Technical Report
Elizaveta Goncharova
Anton Razzhigaev
Matvey Mikhalchuk
Maxim Kurkin
Irina Abdullaeva
Matvey Skripkin
Ivan V. Oseledets
Denis Dimitrov
Andrey Kuznetsov
29
4
0
09 Apr 2024
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
Gwanghyun Kim
Hayeon Kim
H. Seo
Dong un Kang
Se Young Chun
25
4
0
06 Apr 2024
Towards Responsible and Reliable Traffic Flow Prediction with Large Language Models
Xusen Guo
Qiming Zhang
Junyue Jiang
Mingxing Peng
Hao
Hao Yang
Meixin Zhu
AI4TS
20
1
0
03 Apr 2024
Empowering Segmentation Ability to Multi-modal Large Language Models
Yuqi Yang
Peng-Tao Jiang
Jing Wang
Hao Zhang
Kai Zhao
Jinwei Chen
Bo-wen Li
LRM
VLM
19
3
0
21 Mar 2024
Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration
Shu Zhao
Xiaohan Zou
Tan Yu
Huijuan Xu
25
1
0
17 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
27
171
0
14 Mar 2024
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
4
0
23 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
41
9
0
20 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
43
41
0
19 Feb 2024
LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation
Keyang Xuan
Li Yi
Fan Yang
Ruochen Wu
Yi Ren Fung
Heng Ji
24
11
0
19 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min-Bin Lin
LLMAG
LM&Ro
35
47
0
13 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
107
347
0
09 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Di Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
47
42
0
05 Feb 2024
A Survey on Hallucination in Large Vision-Language Models
Hanchao Liu
Wenyuan Xue
Yifei Chen
Dapeng Chen
Xiutian Zhao
Ke Wang
Liping Hou
Rong-Zhi Li
Wei Peng
LRM
MLLM
14
110
0
01 Feb 2024
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
10
103
0
29 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
37
173
0
24 Jan 2024
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination
Syeda Nahida Akter
Aman Madaan
Sangwu Lee
Yiming Yang
Eric Nyberg
ReLM
VLM
LRM
31
2
0
16 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
63
89
0
04 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
21
9
0
03 Jan 2024
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo-Wen Zhang
Xiaolin Wei
Chunhua Shen
MLLM
21
14
0
28 Dec 2023
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao
Yuntao Du
Xintong Zhang
Xiaojian Ma
Wenjuan Han
Song-Chun Zhu
Qing Li
LLMAG
VLM
23
21
0
18 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Nguyen
C. Bezemer
MLLM
VLM
LRM
19
9
0
08 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
87
68
0
05 Dec 2023
Mitigating Hallucination in Visual Language Models with Visual Supervision
Zhiyang Chen
Yousong Zhu
Yufei Zhan
Zhaowen Li
Chaoyang Zhao
Jinqiao Wang
Ming Tang
VLM
MLLM
37
27
0
27 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
66
3
0
10 Nov 2023
Towards Robust Multi-Modal Reasoning via Model Selection
Xiangyan Liu
Rongxue Li
Wei Ji
Tao Lin
LLMAG
LRM
19
3
0
12 Oct 2023
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
21
5
0
23 Sep 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
107
221
0
18 Sep 2023
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Chaoning Zhang
Fachrina Dewi Puspitasari
Sheng Zheng
Chenghao Li
Yu Qiao
...
Caiyan Qin
François Rameau
Lik-Hang Lee
Sung-Ho Bae
Choong Seon Hong
VLM
76
61
0
12 May 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
154
576
0
06 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
Previous
1
2