Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2012.02206
Cited By
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Computer Vision and Pattern Recognition (CVPR), 2020
3 December 2020
Dave Zhenyu Chen
A. Gholami
Matthias Nießner
Angel X. Chang
3DPC
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scan2Cap: Context-aware Dense Captioning in RGB-D Scans"
50 / 72 papers shown
Title
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen
Marc Pollefeys
Daniel Barath
Iro Armeni
VGen
82
0
0
20 Oct 2025
Where, Not What: Compelling Video LLMs to Learn Geometric Causality for 3D-Grounding
Yutong Zhong
VGen
32
0
0
19 Oct 2025
Reasoning in Space via Grounding in the World
Yiming Chen
Zekun Qi
Wenyao Zhang
Xin Jin
Li Zhang
Peidong Liu
LRM
61
0
0
15 Oct 2025
IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation
Wenxu Zhou
Kaixuan Nie
Hang Du
Dong Yin
Wei Huang
Siqiang Guo
Xiaobo Zhang
Pengbo Hu
3DV
97
0
0
14 Oct 2025
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy
Haijier Chen
Bo Xu
Shoujian Zhang
Haoze Liu
Jiaxuan Lin
Jingrong Wang
LRM
49
0
0
29 Sep 2025
Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI
Fei Ni
Min Zhang
Pengyi Li
Yifu Yuan
Lingfeng Zhang
...
Yuzheng Zhuang
Yingxue Zhang
Yan Zheng
Hongyao Tang
Jianye Hao
ELM
103
1
0
18 Sep 2025
3D Aware Region Prompted Vision Language Model
A. Cheng
Yang Fu
Yukang Chen
Zhijian Liu
X. Li
...
Jan Kautz
Pavlo Molchanov
Hongxu Yin
Xiaolong Wang
Sifei Liu
76
1
0
16 Sep 2025
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
Yuecheng Liu
Dafeng Chi
Shiguang Wu
Zhanguang Zhang
Yuzheng Zhuang
...
Pengwei Xie
David Gamaliel Arcos Bravo
Yingxue Zhang
Jianye Hao
Xingyue Quan
LM&Ro
LRM
90
2
0
11 Sep 2025
Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding
Hongpei Zheng
Lintao Xiang
Qijun Yang
Qian Lin
Hujun Yin
0
0
0
03 Sep 2025
Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset
Wentao Mo
Qingchao Chen
Yuxin Peng
Siyuan Huang
Yang Liu
LRM
48
1
0
14 Aug 2025
Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model
Peiqi He
Zhenhao Zhang
Yixiang Zhang
Xiongjun Zhao
Shaoliang Peng
48
1
0
11 Aug 2025
A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding
Mahmoud Chick Zaouali
Todd Charter
Yehor Karpichev
Brandon Haworth
Homayoun Najjjaran
3DGS
145
0
0
07 Aug 2025
NeuroVoxel-LM: Language-Aligned 3D Perception via Dynamic Voxelization and Meta-Embedding
Shiyu Liu
Lianlei Shan
71
0
0
27 Jul 2025
Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting
Xingyu Miao
Haoran Duan
Quanhao Qian
Jiuniu Wang
Yang Long
Ling Shao
Deli Zhao
Ran Xu
Gongjie Zhang
88
2
0
24 Jul 2025
Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models
Xiaoyan Wang
Zeju Li
Yifan Xu
Jiaxing Qi
Zhifei Yang
Ruifei Ma
Xiangde Liu
Chao Zhang
77
1
0
22 Jul 2025
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way
Rajarshi Roy
Devleena Das
A. Banerjee
Arjya Bhattacharjee
Kousik Dasgupta
Subarna Tripathi
VLM
133
0
0
11 Jul 2025
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
Ziyu Zhu
Xilin Wang
Yixuan Li
Zhuofan Zhang
Xiaojian Ma
...
Wei Liang
Qian Yu
Zhidong Deng
Siyuan Huang
Qing Li
LM&Ro
153
8
0
05 Jul 2025
GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
Zijun Lin
Shuting He
Cheston Tan
Bihan Wen
AI4TS
145
0
0
26 Jun 2025
LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning
J. Huang
Xiaojian Ma
Xiongkun Linghu
Yue Fan
Junchao He
...
Qing Li
Song-Chun Zhu
Yixin Chen
Baoxiong Jia
Siyuan Huang
187
2
0
11 Jun 2025
Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models
Hugues Thomas
Chen Chen
Jian Zhang
121
0
0
06 Jun 2025
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang
Zhuofan Zhang
Ziyu Zhu
Yue Fan
Jing Xiong
Pengxiang Li
Xiaojian Ma
Qing Li
161
0
0
05 Jun 2025
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Haoyuan Li
Yanpeng Zhou
Yufei Gao
Tao Tang
J. N. Han
Yujie Yuan
Dave Zhenyu Chen
Jiawang Bian
Hang Xu
Xiaodan Liang
233
3
0
05 Jun 2025
Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs
Fangrui Zhu
Hanhui Wang
Yiming Xie
Jing Gu
Tianye Ding
Jianwei Yang
Huaizu Jiang
3DV
LRM
221
0
0
04 Jun 2025
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng
Shijia Huang
Yanyang Li
Liwei Wang
214
6
0
30 May 2025
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
International Conference on Learning Representations (ICLR), 2025
Henry Zheng
Hao Shi
Qihang Peng
Yong Xien Chng
Rui Huang
Yepeng Weng
Peng Wang
Gao Huang
210
5
0
08 May 2025
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem
Filippo Aleotti
Jamie Watson
Z. Qureshi
Abdelrahman Eldesokey
Peter Wonka
Gabriel J. Brostow
Sara Vicente
Guillermo Garcia-Hernando
DiffM
299
1
0
08 May 2025
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Mohammad Saleha
Azadeh Tabatabaeib
325
3
0
14 Apr 2025
3D CoCa: Contrastive Learners are 3D Captioners
Ting Huang
Zhenru Zhang
Longji Xu
Hao Tang
180
5
0
13 Apr 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Computer Vision and Pattern Recognition (CVPR), 2025
J. Huang
Baoxiong Jia
Longji Xu
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
273
14
0
28 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
193
2
0
17 Mar 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2025
Hanxun Yu
Wentong Li
Song Wang
Jintai Chen
Jianke Zhu
3DV
LRM
236
15
0
01 Mar 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
198
15
0
21 Feb 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Yuan Liu
Hengshuang Zhao
474
37
0
02 Jan 2025
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
International Conference on Learning Representations (ICLR), 2024
Yue Zhang
Zhiyang Xu
Ying Shen
Parisa Kordjamshidi
Lifu Huang
199
15
0
04 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
403
84
0
26 Sep 2024
OE3DIS: Open-Ended 3D Point Cloud Instance Segmentation
P. Nguyen
Minh Luu
Anh Tran
Cuong Pham
Khoi Duc Minh Nguyen
3DPC
159
1
0
21 Aug 2024
3D Question Answering for City Scene Understanding
Yixiang Chen
Yaoxian Song
Xiang Liu
Xiaofei Yang
Qiang-qiang Wang
Tiefeng Li
Yang Yang
Xiaowen Chu
91
5
0
24 Jul 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Ruiyuan Lyu
Tai Wang
Jingli Lin
Shuai Yang
Xiaohan Mao
...
Runsen Xu
Haifeng Huang
Chenming Zhu
Dahua Lin
Jiangmiao Pang
3DV
195
29
0
13 Jun 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Computer Vision and Pattern Recognition (CVPR), 2024
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
318
24
0
07 Jun 2024
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Mahmoud Ahmed
Mahmoud Ahmed
Jian Ding
Eslam Mohamed Bakr
Mohamed Elhoseiny
171
3
0
29 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Brandon Smart
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
202
26
0
16 May 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin
Yupeng Zheng
Pengfei Li
Weize Li
Yuhang Zheng
...
Kun Zhan
Fu Liu
Xiaoxiao Long
Yilun Chen
Hao Zhao
3DV
177
35
0
28 Mar 2024
PointCloud-Text Matching: Benchmark Datasets and a Baseline
Yanglin Feng
Yang Qin
Dezhong Peng
Erik Cambria
Xi Peng
Peng Hu
189
2
0
28 Mar 2024
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang
Jiaming Liu
Ray Zhang
Mingjie Pan
Zoey Guo
Xiaoqi Li
Zehui Chen
Shiyang Feng
Yandong Guo
Shanghang Zhang
3DV
231
93
0
21 Dec 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
142
68
0
12 Sep 2023
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
European Conference on Computer Vision (ECCV), 2023
Ozan Unal
Daniel Gehrig
Suman Saha
Luc Van Gool
161
24
0
08 Sep 2023
FArMARe: a Furniture-Aware Multi-task methodology for Recommending Apartments based on the user interests
Ali Abdari
Alex Falcon
Giuseppe Serra
102
5
0
06 Sep 2023
Towards Real Time Egocentric Segment Captioning for The Blind and Visually Impaired in RGB-D Theatre Images
Khadidja Delloul
S. Larabi
138
2
0
26 Aug 2023
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
183
90
0
17 Aug 2023
3D Concept Learning and Reasoning from Multi-View Images
Computer Vision and Pattern Recognition (CVPR), 2023
Yining Hong
Chun-Tse Lin
Yilun Du
Zhenfang Chen
J. Tenenbaum
Chuang Gan
3DV
154
65
0
20 Mar 2023
1
2
Next