Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.11401
Cited By
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
18 March 2024
Rao Fu
Jingyu Liu
Xilun Chen
Yixin Nie
Wenhan Xiong
LM&Ro
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning"
42 / 42 papers shown
Title
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLM
LRM
40
0
0
08 May 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
60
0
0
03 Apr 2025
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang
Yucheng Zhao
Tiancai Wang
Haoqiang Fan
X. Zhang
Zhaoxiang Zhang
59
0
0
02 Apr 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
87
0
0
29 Mar 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Y. Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
75
3
0
28 Mar 2025
Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI
Danaja Rutar
Alva Markelius
Konstantinos Voudouris
José Hernández Orallo
Lucy G. Cheke
OCL
ELM
56
0
0
27 Mar 2025
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu
Yiquan Li
Khoi Duc Nguyen
Yiwu Zhong
Yin Li
KELM
LRM
79
0
0
25 Mar 2025
GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
Xiaomeng Chu
Jiajun Deng
Guoliang You
Wei Liu
X. Li
Jianmin Ji
Y. Zhang
77
0
0
20 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Y. Yang
Afshin Dehghan
Peter Grasch
72
2
0
17 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
36
0
0
17 Mar 2025
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai
Songyou Peng
Kyle Genova
Leonidas J. Guibas
Thomas Funkhouser
3DGS
75
0
0
08 Mar 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
46
7
0
21 Feb 2025
LLM-EvRep: Learning an LLM-Compatible Event Representation Using a Self-Supervised Framework
Zongyou Yu
Qiang Qu
Qian Zhang
Nan Zhang
Xiaoming Chen
88
2
0
21 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
131
3
0
18 Feb 2025
Hypo3D: Exploring Hypothetical Reasoning in 3D
Ye Mao
Weixun Luo
Junpeng Jing
Anlan Qiu
K. Mikolajczyk
60
0
0
02 Feb 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
83
6
0
02 Jan 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
Rui Liu
Shuwei He
Yifan Hu
H. Li
VLM
87
1
0
16 Dec 2024
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
Hongyan Zhi
Peihao Chen
Junyan Li
Shuailei Ma
Xinyu Sun
Tianhang Xiang
Yinjie Lei
Mingkui Tan
Chuang Gan
67
3
0
02 Dec 2024
Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
Zhaofang Qian
Abolfazl Sharifi
Tucker Carroll
Ser-Nam Lim
VGen
71
0
0
26 Nov 2024
ROOT: VLM based System for Indoor Scene Understanding and Beyond
Yonghui Wang
Shi-Yong Chen
Zhenxing Zhou
Siyi Li
Haoran Li
Wengang Zhou
H. Li
VLM
64
3
0
24 Nov 2024
Learning from Feedback: Semantic Enhancement for Object SLAM Using Foundation Models
Jungseok Hong
Ran Choi
John Leonard
VLM
37
0
0
11 Nov 2024
CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning
Weihang Guo
Zachary K. Kingston
Lydia E. Kavraki
37
2
0
29 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
87
29
0
26 Sep 2024
From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models
Tessa Pulli
Stefan Thalhammer
Simon Schwaiger
Markus Vincze
LM&Ro
35
0
0
09 Sep 2024
E2CL: Exploration-based Error Correction Learning for Embodied Agents
Hanlin Wang
Chak Tou Leong
Jian Wang
Wenjie Li
27
1
0
05 Sep 2024
Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing
Anushrut Jignasu
Kelly O. Marshall
Ankush Kumar Mishra
Lucas Nerone Rillo
Baskar Ganapathysubramanian
Aditya Balu
Chinmay Hegde
Adarsh Krishnamurthy
27
0
0
04 Jul 2024
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
S. Linok
T. Zemskova
Svetlana Ladanova
Roman Titkov
Dmitry A. Yudin
Maxim Monastyrny
Aleksei Valenkov
LM&Ro
43
0
0
11 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
29
9
0
09 Jun 2024
LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model
Yixuan Yang
Junru Lu
Zixiang Zhao
Zhen Luo
James J.Q. Yu
Victor Sanchez
Feng Zheng
3DV
33
3
0
06 Jun 2024
Grounded 3D-LLM with Referent Tokens
Yilun Chen
Shuai Yang
Haifeng Huang
Tai Wang
Ruiyuan Lyu
Runsen Xu
Dahua Lin
Jiangmiao Pang
45
22
0
16 May 2024
Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors
Joao Luzio
Alexandre Bernardino
Plinio Moreno
22
0
0
16 Apr 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
154
280
0
14 Oct 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
LM&Ro
145
337
0
11 Oct 2022
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
D. Fox
Jesse Thomason
Animesh Garg
LM&Ro
LLMAG
112
616
0
22 Sep 2022
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Wenhan Xiong
Anchit Gupta
Shubham Toshniwal
Yashar Mehdad
Wen-tau Yih
RALM
VLM
49
30
0
21 Sep 2022
Open-vocabulary Queryable Scene Representations for Real World Planning
Boyuan Chen
F. Xia
Brian Ichter
Kanishka Rao
K. Gopalakrishnan
Michael S. Ryoo
Austin Stone
Daniel Kappler
LM&Ro
144
179
0
20 Sep 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
136
430
0
10 Jul 2022
FILM: Following Instructions in Language with Modular Methods
So Yeon Min
Devendra Singh Chaplot
Pradeep Ravikumar
Yonatan Bisk
Ruslan Salakhutdinov
LM&Ro
193
159
0
12 Oct 2021
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution
Valts Blukis
Chris Paxton
D. Fox
Animesh Garg
Yoav Artzi
LM&Ro
204
133
0
12 Jul 2021
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
C. Qi
Hao Su
Kaichun Mo
Leonidas J. Guibas
3DH
3DPC
3DV
PINN
213
13,886
0
02 Dec 2016
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
Adam Paszke
Abhishek Chaurasia
Sangpil Kim
Eugenio Culurciello
SSeg
204
2,034
0
07 Jun 2016
1