Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.12871
Cited By
An Embodied Generalist Agent in 3D World
18 November 2023
Jiangyong Huang
Silong Yong
Xiaojian Ma
Xiongkun Linghu
Puhao Li
Yan Wang
Qing Li
Song-Chun Zhu
Baoxiong Jia
Siyuan Huang
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Embodied Generalist Agent in 3D World"
30 / 30 papers shown
Title
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLM
LRM
40
0
0
08 May 2025
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
Huajie Tan
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Yaoxu Lyu
Mingyu Cao
Zhongyuan Wang
S. Zhang
LM&Ro
36
0
0
06 May 2025
Task Reconstruction and Extrapolation for
π
0
π_0
π
0
using Text Latent
Quanyi Li
28
0
0
06 May 2025
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
Nader Zantout
Haochen Zhang
Pujith Kachana
J. Qiu
Ji Zhang
Wenshan Wang
LM&Ro
LRM
50
0
0
25 Apr 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
87
0
0
29 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
W. Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Y. Zhuang
LM&Ro
LRM
65
2
0
27 Mar 2025
RAIDER: Tool-Equipped Large Language Model Agent for Robotic Action Issue Detection, Explanation and Recovery
Silvia Izquierdo-Badiola
Carlos Rizzo
Guillem Alenyà
LLMAG
LM&Ro
79
0
0
22 Mar 2025
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
Yu Zhang
Shutong Qiao
Jiaqi Zhang
Tzu-Heng Lin
Chen Gao
Y. Li
LM&Ro
LM&MA
82
0
0
07 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
82
5
0
28 Feb 2025
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
Yu Liu
Baoxiong Jia
Ruijie Lu
Junfeng Ni
Song-Chun Zhu
Siyuan Huang
3DGS
73
7
0
26 Feb 2025
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
Zirui Song
Jingpu Yang
Yuan Huang
Jonathan Tonglet
Zeyu Zhang
Tao Cheng
Meng Fang
Iryna Gurevych
X. Chen
LRM
58
1
0
19 Feb 2025
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu
Yixin Chen
Junfeng Ni
Baoxiong Jia
Yu Liu
Diwen Wan
Gang Zeng
Siyuan Huang
DiffM
119
4
0
16 Dec 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
83
4
0
25 Nov 2024
BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation
Haechan Mark Bong
Ricardo de Azambuja
Giovanni Beltrame
VLM
23
0
0
16 Oct 2024
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs
Venkata Naren Devarakonda
Raktim Gautam Goswami
Ali Umut Kaypak
Naman Patel
Rooholla Khorrambakht
P. Krishnamurthy
Farshad Khorrami
LM&Ro
30
3
0
08 Oct 2024
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Kaizhi Zheng
Xiaotong Chen
Xuehai He
Jing Gu
Linjie Li
Zhengyuan Yang
Kevin Qinghong Lin
Jianfeng Wang
Lijuan Wang
Xin Eric Wang
KELM
DiffM
35
0
0
03 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
90
29
0
26 Sep 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
63
11
0
07 Jun 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
62
38
0
23 May 2024
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Hongyuan Zhu
Jiayuan Fan
Tao Chen
MLLM
24
76
0
30 Nov 2023
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
Shaofei Cai
Bowei Zhang
Zihao Wang
Xiaojian Ma
Anji Liu
Yitao Liang
83
26
0
12 Oct 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
110
221
0
18 Sep 2023
MindAgent: Emergent Gaming Interaction
Ran Gong
Qiuyuan Huang
Xiaojian Ma
Hoi Vo
Zane Durante
...
Zilong Zheng
Song-Chun Zhu
Demetri Terzopoulos
Fei-Fei Li
Jianfeng Gao
LM&Ro
99
61
0
18 Sep 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
203
883
0
27 Apr 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
157
576
0
06 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
Arjun Majumdar
Gunjan Aggarwal
Bhavika Devnani
Judy Hoffman
Dhruv Batra
LM&Ro
147
148
0
24 Jun 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
1