Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.05132
Cited By
v1
v2
v3 (latest)
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Computer Vision and Pattern Recognition (CVPR), 2024
7 June 2024
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (31 upvotes)
Papers citing
"3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination"
50 / 81 papers shown
Title
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding
Yutao Tang
Cheng Zhao
Gaurav Mittal
Rohith Kukkala
Rama Chellappa
Cheng-Fang Peng
Mei Chen
VLM
124
0
0
26 Nov 2025
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
Italian National Conference on Sensors (INS), 2025
Vinit Mehta
Charu Sharma
Karthick Thiyagarajan
LM&Ro
356
1
0
14 Nov 2025
MCP4IFC: IFC-Based Building Design Using Large Language Models
Bharathi Kannan Nithyanantham
Tobias Sesterhenn
Ashwin Nedungadi
Sergio Peral Garijo
Janis Zenkner
Christian Bartelt
Stefan Lüdtke
AI4CE
112
0
0
29 Oct 2025
Pursuing Minimal Sufficiency in Spatial Reasoning
Yejie Guo
Yunzhong Hou
Wufei Ma
Meng Tang
Ming-Hsuan Yang
LRM
80
0
0
19 Oct 2025
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
Xiongkun Linghu
Jiangyong Huang
Ziyu Zhu
Baoxiong Jia
Siyuan Huang
LRM
141
1
0
19 Oct 2025
Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model
R. Liu
Junwei Zheng
Yufan Chen
Zirui Wang
Kunyu Peng
Kailun Yang
Jiaming Zhang
Marc Pollefeys
Rainer Stiefelhagen
112
0
0
13 Oct 2025
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Suresh Damodaran
Paul D. Rowe
AAML
128
8
0
07 Oct 2025
LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
Pranav Saxena
A. Bhattacharya
Ji Zhang
Wenshan Wang
151
1
0
29 Sep 2025
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models
Trishna Chakraborty
Udita Ghosh
Xiaopan Zhang
Fahim Faisal Niloy
Yue Dong
Jiachen Li
Amit K. Roy-Chowdhury
Chengyu Song
LLMAG
HILM
LRM
218
3
0
18 Jun 2025
LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning
J. Huang
Xiaojian Ma
Xiongkun Linghu
Yue Fan
Junchao He
...
Qing Li
Song-Chun Zhu
Yixin Chen
Baoxiong Jia
Siyuan Huang
266
2
0
11 Jun 2025
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem
Filippo Aleotti
Jamie Watson
Z. Qureshi
Abdelrahman Eldesokey
Peter Wonka
Gabriel J. Brostow
Sara Vicente
Guillermo Garcia-Hernando
DiffM
431
1
0
08 May 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Computer Vision and Pattern Recognition (CVPR), 2025
J. Huang
Baoxiong Jia
Longji Xu
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
349
17
0
28 Mar 2025
Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes
Kelly O. Marshall
Omid Poursaeed
Sergiu Oprea
Amit Kumar
Anushrut Jignasu
Chinmay Hegde
Yilei Li
Rakesh Ranjan
3DV
308
0
0
23 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
340
0
0
17 Mar 2025
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
International Conference on Learning Representations (ICLR), 2024
Zheyuan Zhang
Fengyuan Hu
Jayjun Lee
Freda Shi
Parisa Kordjamshidi
Joyce Chai
Ziqiao Ma
398
37
0
22 Oct 2024
Affordance-Guided Reinforcement Learning via Visual Prompting
Olivia Y. Lee
Annie Xie
Kuan Fang
Karl Pertsch
Chelsea Finn
OffRL
LM&Ro
540
25
0
14 Jul 2024
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Weitai Kang
Mengxue Qu
Jyoti Kini
Yunchao Wei
Mubarak Shah
Yan Yan
LM&Ro
3DPC
222
16
0
28 May 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLM
VLM
347
74
0
26 Feb 2024
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
Zeju Li
Chao Zhang
Xiaoyan Wang
Ruilong Ren
Yifan Xu
Ruifei Ma
Xiangde Liu
MLLM
221
41
0
06 Jan 2024
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang
Xiaohan Mao
Chenming Zhu
Runsen Xu
Ruiyuan Lyu
...
Tianfan Xue
Xihui Liu
Cewu Lu
Dahua Lin
Jiangmiao Pang
LM&Ro
231
125
0
26 Dec 2023
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
284
35
0
17 Dec 2023
Pixel Aligned Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLM
VLM
255
17
0
14 Dec 2023
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Computer Vision and Pattern Recognition (CVPR), 2023
Yue Yang
Fan-Yun Sun
Luca Weihs
Eli VanderBilt
Alvaro Herrasti
...
Lingjie Liu
Chris Callison-Burch
Mark Yatskar
Aniruddha Kembhavi
Christopher Clark
LM&Ro
403
174
0
14 Dec 2023
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
Neural Information Processing Systems (NeurIPS), 2023
Haifeng Huang
Zehan Wang
Rongjie Huang
Luping Liu
Xize Cheng
Yang Zhao
Tao Jin
Zhou Zhao
307
12
0
13 Dec 2023
ControlRoom3D: Room Generation using Semantic Proxy Rooms
Computer Vision and Pattern Recognition (CVPR), 2023
Jonas Schult
Sam S. Tsai
Lukas Höllein
Bichen Wu
Jialiang Wang
...
Zijian He
Peizhao Zhang
Bastian Leibe
Peter Vajda
Ji Hou
250
57
0
08 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Computer Vision and Pattern Recognition (CVPR), 2023
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
MLLM
285
166
0
30 Nov 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Computer Vision and Pattern Recognition (CVPR), 2023
Qidong Huang
Xiao-wen Dong
Pan Zhang
Sijin Yu
Conghui He
Yuan Liu
Dahua Lin
Weiming Zhang
Neng H. Yu
MLLM
427
351
0
29 Nov 2023
An Embodied Generalist Agent in 3D World
Jiangyong Huang
Silong Yong
Xiaojian Ma
Xiongkun Linghu
Puhao Li
Yan Wang
Qing Li
Song-Chun Zhu
Baoxiong Jia
Siyuan Huang
LM&Ro
275
288
0
18 Nov 2023
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
394
1,832
0
09 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
Computer Vision and Pattern Recognition (CVPR), 2023
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLM
VLM
413
384
0
06 Nov 2023
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Jianwei Yang
Hao Zhang
Feng Li
Xueyan Zou
Chun-yue Li
Jianfeng Gao
MLLM
VLM
400
268
0
17 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
International Conference on Learning Representations (ICLR), 2023
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjD
MLLM
VLM
399
450
0
11 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
International Conference on Learning Representations (ICLR), 2023
Yiyang Zhou
Chenhang Cui
Jaehong Yoon
Linjun Zhang
Zhun Deng
Chelsea Finn
Mohit Bansal
Huaxiu Yao
MLLM
312
262
0
01 Oct 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-Xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
273
580
0
25 Sep 2023
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
IEEE International Conference on Robotics and Automation (ICRA), 2023
Jianing Yang
Xuweiyi Chen
Shengyi Qian
Nikhil Madaan
Madhavan Iyengar
David Fouhey
Joyce Chai
LM&Ro
LLMAG
350
145
0
21 Sep 2023
Dense Object Grounding in 3D Scenes
ACM Multimedia (ACM MM), 2023
Wencan Huang
Daizong Liu
Wei Hu
222
24
0
05 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
489
1,536
0
24 Aug 2023
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
259
106
0
17 Aug 2023
Detecting and Preventing Hallucinations in Large Vision Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2023
Anisha Gunjal
Jihan Yin
Erhan Bas
MLLM
VLM
277
246
0
11 Aug 2023
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
IEEE International Conference on Computer Vision (ICCV), 2023
Ziyu Zhu
Xiaojian Ma
Yixin Chen
Zhidong Deng
Siyuan Huang
Qing Li
LM&Ro
208
206
0
08 Aug 2023
LISA: Reasoning Segmentation via Large Language Model
Computer Vision and Pattern Recognition (CVPR), 2023
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&Ro
VLM
MLLM
LRM
453
707
0
01 Aug 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Conference on Robot Learning (CoRL), 2023
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
LRM
553
2,105
0
28 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
International Conference on Learning Representations (ICLR), 2023
Tri Dao
LRM
401
2,032
0
17 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Conference on Robot Learning (CoRL), 2023
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
383
732
0
12 Jul 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
International Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
369
1,010
0
26 Jun 2023
Evaluating Object Hallucination in Large Vision-Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
679
1,226
0
17 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
International Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
444
2,662
0
20 Apr 2023
Visual Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
1.1K
7,256
0
17 Apr 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
Neural Information Processing Systems (NeurIPS), 2023
Wanrong Zhu
Jack Hessel
Anas Awadalla
S. Gadre
Jesse Dodge
Alex Fang
Youngjae Yu
Ludwig Schmidt
William Yang Wang
Yejin Choi
VLM
457
217
0
14 Apr 2023
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
IEEE International Conference on Computer Vision (ICCV), 2023
Lukas Höllein
Ang Cao
Andrew Owens
Justin Johnson
Matthias Nießner
DiffM
458
232
0
21 Mar 2023
1
2
Next