Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.16170
Cited By
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
26 December 2023
Tai Wang
Xiaohan Mao
Chenming Zhu
Runsen Xu
Ruiyuan Lyu
Peisen Li
Xiao Chen
Wenwei Zhang
Kai Chen
Tianfan Xue
Xihui Liu
Cewu Lu
Dahua Lin
Jiangmiao Pang
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI"
50 / 50 papers shown
Title
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
Henry Zheng
Hao Shi
Qihang Peng
Yong Xien Chng
Rui Huang
Yepeng Weng
Zhongchao Shi
Gao Huang
56
1
0
08 May 2025
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem
Filippo Aleotti
Jamie Watson
Z. Qureshi
Abdelrahman Eldesokey
Peter Wonka
Gabriel J. Brostow
Sara Vicente
Guillermo Garcia-Hernando
DiffM
50
0
0
08 May 2025
Occupancy World Model for Robots
Zhang Zhang
Qiang Zhang
Wei Cui
Shuai Shi
Yijie Guo
...
Hao-Ran Cheng
Xiaozhu Ju
Zhengping Che
Renjing Xu
Jian-Bo Tang
24
0
0
07 May 2025
V
2
^2
2
R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi Ren Fung
43
0
0
23 Apr 2025
RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots
Zhang Zhang
Qiang Zhang
Wei Cui
Shuai Shi
Yijie Guo
Gang Han
Wen Zhao
Hengle Ren
Renjing Xu
Jian Tang
35
1
0
20 Apr 2025
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning
Baining Zhao
Z. Wang
Jianjie Fang
Chen Gao
Fanhang Man
Jinqiang Cui
Xin Wang
Xinlei Chen
Y. Li
Wenwu Zhu
LM&Ro
VLM
LRM
50
1
0
17 Apr 2025
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
Nicolas Baumann
Cheng Hu
Paviththiren Sivasothilingam
Haotong Qin
Lei Xie
Michele Magno
Luca Benini
27
1
0
15 Apr 2025
Multi-Object Grounding via Hierarchical Contrastive Siamese Transformers
Chengyi Du
Keyan Jin
22
0
0
14 Apr 2025
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan
Yibo Peng
Jinke Ren
Yinghong Liao
Yatong Han
Chun-Mei Feng
Hengshuang Zhao
G. Li
Shuguang Cui
Zhen Li
49
0
0
29 Mar 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Y. Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
75
3
0
28 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Y. Yang
Afshin Dehghan
Peter Grasch
72
2
0
17 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
36
0
0
17 Mar 2025
GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction
Jianheng Liu
Yunfei Wan
Bowen Wang
Chunran Zheng
Jiarong Lin
Fu Zhang
3DGS
55
0
0
13 Mar 2025
Embodied Crowd Counting
Runling Long
Yunlong Wang
Jia Wan
Xiang Deng
Xinting Zhu
Weili Guan
Antoni B. Chan
Liqiang Nie
58
0
0
11 Mar 2025
Fake It To Make It: Virtual Multiviews to Enhance Monocular Indoor Semantic Scene Completion
Anith Selvakumar
Manasa Bharadwaj
36
0
0
07 Mar 2025
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
Jingzhou Luo
Y. Liu
Weixing Chen
Zhen Li
Y. Wang
G. Li
Liang Lin
52
2
0
05 Mar 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu
Wentong Li
Song Wang
J. Chen
Jianke Zhu
3DV
LRM
71
3
0
01 Mar 2025
SCA3D: Enhancing Cross-modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation
Junlong Ren
Hao Wu
Hui Xiong
H. Wang
63
0
0
26 Feb 2025
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
Qihang Peng
Henry Zheng
Gao Huang
3DPC
77
0
0
26 Feb 2025
SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Jianing Li
Ming Lu
Hao Wang
Chenyang Gu
Wenzhao Zheng
Li Du
S. Zhang
86
0
0
28 Jan 2025
Efficient and Trustworthy Block Propagation for Blockchain-enabled Mobile Embodied AI Networks: A Graph Resfusion Approach
Jiawen Kang
Jiana Liao
Runquan Gao
Jinbo Wen
Huawei Huang
Maomao Zhang
Changyan Yi
Tao Zhang
Dusit Niyato
Zibin Zheng
67
0
0
26 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
83
6
0
02 Jan 2025
How Panel Layouts Define Manga: Insights from Visual Ablation Experiments
Siyuan Feng
Teruya Yoshinaga
Katsuhiko Hayashi
Koki Washio
Hidetaka Kamigaito
28
0
0
26 Dec 2024
Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
Zhaofang Qian
Abolfazl Sharifi
Tucker Carroll
Ser-Nam Lim
VGen
74
0
0
26 Nov 2024
Language Driven Occupancy Prediction
Zhu Yu
Bowen Pang
Lizhe Liu
Runmin Zhang
Qihao Peng
Maochun Luo
Sheng Yang
Mingxia Chen
Si-Yuan Cao
Hui-Liang Shen
81
2
0
25 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
83
4
0
25 Nov 2024
ROOT: VLM based System for Indoor Scene Understanding and Beyond
Yonghui Wang
Shi-Yong Chen
Zhenxing Zhou
Siyi Li
Haoran Li
Wengang Zhou
H. Li
VLM
67
3
0
24 Nov 2024
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Runsen Xu
Zhiwei Huang
Tai Wang
Y. Chen
Jiangmiao Pang
Dahua Lin
VGen
34
11
0
17 Oct 2024
DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction
Zhen Yang
Yanpeng Dong
Heng Wang
3DPC
35
2
0
30 Sep 2024
Grounding 3D Scene Affordance From Egocentric Interactions
Cuiyu Liu
Wei Zhai
Yuhang Yang
Hongchen Luo
Sen Liang
Yang Cao
Zheng-Jun Zha
26
1
0
29 Sep 2024
LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction
Changjian Jiang
Ruilan Gao
Kele Shao
Yue Wang
R. Xiong
Yu Zhang
3DV
31
7
0
19 Sep 2024
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan
Rui Liu
Wenguan Wang
Yi Yang
40
5
0
21 Jul 2024
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Y. Liu
Weixing Chen
Yongjie Bai
Xiaodan Liang
Guanbin Li
Wen Gao
Liang Lin
LM&Ro
SyDa
AI4CE
48
27
0
09 Jul 2024
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
Chenming Zhu
Tai Wang
Wenwei Zhang
Kai Chen
Xihui Liu
ReLM
LRM
45
16
0
01 Jul 2024
Diffusion Models in Low-Level Vision: A Survey
Chunming He
Yuqi Shen
Chengyu Fang
Fengyang Xiao
Longxiang Tang
Yulun Zhang
W. Zuo
Zhenhua Guo
Xiu Li
VLM
DiffM
MedIm
61
25
0
17 Jun 2024
Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center
Zichen Yu
Changyong Shu
Qianpu Sun
Junjie Linghu
Xiaobao Wei
Jiangyong Yu
Zongdai Liu
Dawei Yang
Hui Li
Yan Chen
24
4
0
15 Jun 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Ruiyuan Lyu
Tai Wang
Jingli Lin
Shuai Yang
Xiaohan Mao
...
Runsen Xu
Haifeng Huang
Chenming Zhu
Dahua Lin
Jiangmiao Pang
3DV
36
9
0
13 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
29
9
0
09 Jun 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
63
11
0
07 Jun 2024
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
Sijin Chen
Xin Chen
Anqi Pang
Xianfang Zeng
Wei Cheng
...
C. Zhang
Jingyi Yu
Gang Yu
Bin-Bin Fu
Tao Chen
AI4CE
50
35
0
31 May 2024
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views
Yuhang Yang
Wei Zhai
Chengfeng Wang
Chengjun Yu
Yang Cao
Zheng-jun Zha
33
5
0
22 May 2024
Grounded 3D-LLM with Referent Tokens
Yilun Chen
Shuai Yang
Haifeng Huang
Tai Wang
Ruiyuan Lyu
Runsen Xu
Dahua Lin
Jiangmiao Pang
45
22
0
16 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
29
11
0
16 May 2024
Volumetric Environment Representation for Vision-Language Navigation
Rui Liu
Wenguan Wang
Yi Yang
32
23
0
21 Mar 2024
Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection
Chenming Zhu
Wenwei Zhang
Tai Wang
Xihui Liu
Kai-xiang Chen
3DPC
37
18
0
18 Sep 2023
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
Xiaoyu Tian
Tao Jiang
Longfei Yun
Yucheng Mao
Huitong Yang
Yue Wang
Yilun Wang
Hang Zhao
3DPC
3DV
66
200
0
27 Apr 2023
Habitat-Matterport 3D Semantics Dataset
Karmesh Yadav
Ram Ramrakhya
Santhosh Kumar Ramakrishnan
Theo Gervet
John Turner
...
Angel X. Chang
Dhruv Batra
Manolis Savva
Alexander William Clegg
Devendra Singh Chaplot
3DV
MDE
81
81
0
11 Oct 2022
ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
C. Qi
Xinlei Chen
Or Litany
Leonidas J. Guibas
3DPC
178
239
0
29 Jan 2020
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
166
21,643
0
09 Dec 2016
Indoor Semantic Segmentation using depth information
Camille Couprie
C. Farabet
Laurent Najman
Yann LeCun
SSeg
MDE
65
473
0
16 Jan 2013
1