Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.15116
Cited By
Large Multimodal Agents: A Survey
23 February 2024
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Multimodal Agents: A Survey"
11 / 11 papers shown
Title
DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving
Xinmeng Hou
Wuqi Wang
Long Yang
Hao Lin
Jinglun Feng
Haigen Min
Xiangmo Zhao
21
0
0
04 May 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
40
0
0
05 Mar 2025
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension
Runwei Guan
Ruixiao Zhang
Ningwei Ouyang
Jianan Liu
Ka Lok Man
...
Ming Xu
Jeremy S. Smith
Eng Gee Lim
Yutao Yue
Hui Xiong
38
8
0
21 May 2024
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Zhiyong Wu
Chengcheng Han
Zichen Ding
Zhenmin Weng
Zhoumianze Liu
Shunyu Yao
Tao Yu
Lingpeng Kong
LLMAG
LM&Ro
107
29
0
12 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
14
0
30 Jan 2024
WebWISE: Web Interface Control and Sequential Exploration with Large Language Models
Heyi Tao
TV Sethuraman
Michal Shlapentokh-Rothman
Derek Hoiem
LLMAG
37
4
0
24 Oct 2023
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Yixiao Zhang
Akira Maezawa
Gus Xia
Kazuhiko Yamamoto
Simon Dixon
39
15
0
19 Oct 2023
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Jesse Zhang
Jiahui Zhang
Karl Pertsch
Ziyi Liu
Xiang Ren
Minsuk Chang
Shao-Hua Sun
Joseph J. Lim
LLMAG
LM&Ro
57
31
0
16 Oct 2023
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Junke Wang
Dongdong Chen
Chong Luo
Xiyang Dai
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
87
54
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
1