Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.10482
Cited By
ScanQA: 3D Question Answering for Spatial Scene Understanding
20 December 2021
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
M. Kawanabe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ScanQA: 3D Question Answering for Spatial Scene Understanding"
26 / 26 papers shown
Title
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
Huajie Tan
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Yaoxu Lyu
Mingyu Cao
Zhongyuan Wang
S. Zhang
LM&Ro
36
0
0
06 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
87
0
0
29 Mar 2025
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Yanjun Chen
Yirong Sun
Xinghao Chen
Jian Wang
Xiaoyu Shen
W. Li
Wei Zhang
3DV
LRM
49
1
0
08 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
80
5
0
28 Feb 2025
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
80
4
0
25 Nov 2024
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Rasoul Shafipour
David Harrison
Maxwell Horton
Jeffrey Marker
Houman Bedayat
Sachin Mehta
Mohammad Rastegari
Mahyar Najibi
Saman Naderiparizi
MQ
43
3
0
14 Oct 2024
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs
Venkata Naren Devarakonda
Raktim Gautam Goswami
Ali Umut Kaypak
Naman Patel
Rooholla Khorrambakht
P. Krishnamurthy
Farshad Khorrami
LM&Ro
27
3
0
08 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
84
29
0
26 Sep 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
61
54
0
19 Sep 2024
QueryCAD: Grounded Question Answering for CAD Models
Claudius Kienle
Benjamin Alt
Darko Katic
Rainer Jäkel
Jan Peters
16
1
0
13 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
67
15
0
05 Sep 2024
Answerability Fields: Answerable Location Estimation via Diffusion Models
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
Koya Sakamoto
M. Kawanabe
DiffM
18
0
0
26 Jul 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
61
11
0
07 Jun 2024
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Kuan-Chih Huang
Xiangtai Li
Lu Qi
Shuicheng Yan
Ming-Hsuan Yang
LRM
46
9
0
27 May 2024
Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Wenshan Wu
Shaoguang Mao
Yadong Zhang
Yan Xia
Li Dong
Lei Cui
Furu Wei
LRM
34
18
0
04 Apr 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
55
4
0
08 Feb 2024
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Hongyuan Zhu
Fukun Yin
Gang Yu
Tao Chen
12
23
0
17 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Hongyuan Zhu
Jiayuan Fan
Tao Chen
MLLM
21
76
0
30 Nov 2023
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
13
5
0
13 Oct 2023
Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots
Yoshiki Obinata
Naoaki Kanazawa
Kento Kawaharazuka
Iori Yanokura
Soon-Hyeob Kim
K. Okada
Masayuki Inaba
LM&Ro
6
7
0
07 Aug 2023
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
17
28
0
12 Dec 2022
Novel 3D Scene Understanding Applications From Recurrence in a Single Image
Shimian Zhang
Skanda Bharadwaj
Keaton Kraiger
Yashasvi Asthana
Hong Zhang
R. Collins
Yanxi Liu
28
1
0
14 Oct 2022
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
59
128
0
01 Mar 2021
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&Ro
LRM
237
444
0
07 Jun 2018
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
211
7,687
0
17 Aug 2015
1