ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.05132
  4. Cited By
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
v1v2v3 (latest)

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Computer Vision and Pattern Recognition (CVPR), 2024
7 June 2024
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
    3DV
ArXiv (abs)PDFHTMLHuggingFace (31 upvotes)

Papers citing "3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination"

50 / 81 papers shown
Title
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding
Yutao Tang
Cheng Zhao
Gaurav Mittal
Rohith Kukkala
Rama Chellappa
Cheng-Fang Peng
Mei Chen
VLM
124
0
0
26 Nov 2025
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
Large Language Models and 3D Vision for Intelligent Robotic Perception and AutonomyItalian National Conference on Sensors (INS), 2025
Vinit Mehta
Charu Sharma
Karthick Thiyagarajan
LM&Ro
356
1
0
14 Nov 2025
MCP4IFC: IFC-Based Building Design Using Large Language Models
MCP4IFC: IFC-Based Building Design Using Large Language Models
Bharathi Kannan Nithyanantham
Tobias Sesterhenn
Ashwin Nedungadi
Sergio Peral Garijo
Janis Zenkner
Christian Bartelt
Stefan Lüdtke
AI4CE
112
0
0
29 Oct 2025
Pursuing Minimal Sufficiency in Spatial Reasoning
Pursuing Minimal Sufficiency in Spatial Reasoning
Yejie Guo
Yunzhong Hou
Wufei Ma
Meng Tang
Ming-Hsuan Yang
LRM
80
0
0
19 Oct 2025
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
Xiongkun Linghu
Jiangyong Huang
Ziyu Zhu
Baoxiong Jia
Siyuan Huang
LRM
141
1
0
19 Oct 2025
Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model
Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model
R. Liu
Junwei Zheng
Yufan Chen
Zirui Wang
Kunyu Peng
Kailun Yang
Jiaming Zhang
Marc Pollefeys
Rainer Stiefelhagen
112
0
0
13 Oct 2025
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Suresh Damodaran
Paul D. Rowe
AAML
128
8
0
07 Oct 2025
LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
Pranav Saxena
A. Bhattacharya
Ji Zhang
Wenshan Wang
151
1
0
29 Sep 2025
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models
Trishna Chakraborty
Udita Ghosh
Xiaopan Zhang
Fahim Faisal Niloy
Yue Dong
Jiachen Li
Amit K. Roy-Chowdhury
Chengyu Song
LLMAGHILMLRM
218
3
0
18 Jun 2025
LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning
LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning
J. Huang
Xiaojian Ma
Xiongkun Linghu
Yue Fan
Junchao He
...
Qing Li
Song-Chun Zhu
Yixin Chen
Baoxiong Jia
Siyuan Huang
266
2
0
11 Jun 2025
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem
Filippo Aleotti
Jamie Watson
Z. Qureshi
Abdelrahman Eldesokey
Peter Wonka
Gabriel J. Brostow
Sara Vicente
Guillermo Garcia-Hernando
DiffM
431
1
0
08 May 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-AnalysisComputer Vision and Pattern Recognition (CVPR), 2025
J. Huang
Baoxiong Jia
Longji Xu
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
349
17
0
28 Mar 2025
Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes
Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes
Kelly O. Marshall
Omid Poursaeed
Sergiu Oprea
Amit Kumar
Anushrut Jignasu
Chinmay Hegde
Yilei Li
Rakesh Ranjan
3DV
308
0
0
23 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
340
0
0
17 Mar 2025
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesInternational Conference on Learning Representations (ICLR), 2024
Zheyuan Zhang
Fengyuan Hu
Jayjun Lee
Freda Shi
Parisa Kordjamshidi
Joyce Chai
Ziqiao Ma
398
37
0
22 Oct 2024
Affordance-Guided Reinforcement Learning via Visual Prompting
Affordance-Guided Reinforcement Learning via Visual Prompting
Olivia Y. Lee
Annie Xie
Kuan Fang
Karl Pertsch
Chelsea Finn
OffRLLM&Ro
540
25
0
14 Jul 2024
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Weitai Kang
Mengxue Qu
Jyoti Kini
Yunchao Wei
Mubarak Shah
Yan Yan
LM&Ro3DPC
222
16
0
28 May 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLMVLM
347
74
0
26 Feb 2024
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
Zeju Li
Chao Zhang
Xiaoyan Wang
Ruilong Ren
Yifan Xu
Ruifei Ma
Xiangde Liu
MLLM
221
41
0
06 Jan 2024
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards
  Embodied AI
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang
Xiaohan Mao
Chenming Zhu
Runsen Xu
Ruiyuan Lyu
...
Tianfan Xue
Xihui Liu
Cewu Lu
Dahua Lin
Jiangmiao Pang
LM&Ro
231
125
0
26 Dec 2023
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
284
35
0
17 Dec 2023
Pixel Aligned Language Models
Pixel Aligned Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLMVLM
255
17
0
14 Dec 2023
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Holodeck: Language Guided Generation of 3D Embodied AI EnvironmentsComputer Vision and Pattern Recognition (CVPR), 2023
Yue Yang
Fan-Yun Sun
Luca Weihs
Eli VanderBilt
Alvaro Herrasti
...
Lingjie Liu
Chris Callison-Burch
Mark Yatskar
Aniruddha Kembhavi
Christopher Clark
LM&Ro
403
174
0
14 Dec 2023
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object
  Identifiers
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object IdentifiersNeural Information Processing Systems (NeurIPS), 2023
Haifeng Huang
Zehan Wang
Rongjie Huang
Luping Liu
Xize Cheng
Yang Zhao
Tao Jin
Zhou Zhao
307
12
0
13 Dec 2023
ControlRoom3D: Room Generation using Semantic Proxy Rooms
ControlRoom3D: Room Generation using Semantic Proxy RoomsComputer Vision and Pattern Recognition (CVPR), 2023
Jonas Schult
Sam S. Tsai
Lukas Höllein
Bichen Wu
Jialiang Wang
...
Zijian He
Peizhao Zhang
Bastian Leibe
Peter Vajda
Ji Hou
250
57
0
08 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
  Reasoning, and Planning
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningComputer Vision and Pattern Recognition (CVPR), 2023
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
MLLM
285
166
0
30 Nov 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
  via Over-Trust Penalty and Retrospection-Allocation
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-AllocationComputer Vision and Pattern Recognition (CVPR), 2023
Qidong Huang
Xiao-wen Dong
Pan Zhang
Sijin Yu
Conghui He
Yuan Liu
Dahua Lin
Weiming Zhang
Neng H. Yu
MLLM
427
351
0
29 Nov 2023
An Embodied Generalist Agent in 3D World
An Embodied Generalist Agent in 3D World
Jiangyong Huang
Silong Yong
Xiaojian Ma
Xiongkun Linghu
Puhao Li
Yan Wang
Qing Li
Song-Chun Zhu
Baoxiong Jia
Siyuan Huang
LM&Ro
275
288
0
18 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRMHILM
394
1,832
0
09 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
GLaMM: Pixel Grounding Large Multimodal ModelComputer Vision and Pattern Recognition (CVPR), 2023
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLMVLM
413
384
0
06 Nov 2023
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Jianwei Yang
Hao Zhang
Feng Li
Xueyan Zou
Chun-yue Li
Jianfeng Gao
MLLMVLM
400
268
0
17 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Ferret: Refer and Ground Anything Anywhere at Any GranularityInternational Conference on Learning Representations (ICLR), 2023
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjDMLLMVLM
399
450
0
11 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language
  Models
Analyzing and Mitigating Object Hallucination in Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Yiyang Zhou
Chenhang Cui
Jaehong Yoon
Linjun Zhang
Zhun Deng
Chelsea Finn
Mohit Bansal
Huaxiu Yao
MLLM
312
262
0
01 Oct 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Aligning Large Multimodal Models with Factually Augmented RLHFAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-Xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
273
580
0
25 Sep 2023
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language
  Model as an Agent
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an AgentIEEE International Conference on Robotics and Automation (ICRA), 2023
Jianing Yang
Xuweiyi Chen
Shengyi Qian
Nikhil Madaan
Madhavan Iyengar
David Fouhey
Joyce Chai
LM&RoLLMAG
350
145
0
21 Sep 2023
Dense Object Grounding in 3D Scenes
Dense Object Grounding in 3D ScenesACM Multimedia (ACM MM), 2023
Wencan Huang
Daizong Liu
Wei Hu
222
24
0
05 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
489
1,536
0
24 Aug 2023
Chat-3D: Data-efficiently Tuning Large Language Model for Universal
  Dialogue of 3D Scenes
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
259
106
0
17 Aug 2023
Detecting and Preventing Hallucinations in Large Vision Language Models
Detecting and Preventing Hallucinations in Large Vision Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023
Anisha Gunjal
Jihan Yin
Erhan Bas
MLLMVLM
277
246
0
11 Aug 2023
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
3D-VisTA: Pre-trained Transformer for 3D Vision and Text AlignmentIEEE International Conference on Computer Vision (ICCV), 2023
Ziyu Zhu
Xiaojian Ma
Yixin Chen
Zhidong Deng
Siyuan Huang
Qing Li
LM&Ro
208
206
0
08 Aug 2023
LISA: Reasoning Segmentation via Large Language Model
LISA: Reasoning Segmentation via Large Language ModelComputer Vision and Pattern Recognition (CVPR), 2023
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&RoVLMMLLMLRM
453
707
0
01 Aug 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
  Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlConference on Robot Learning (CoRL), 2023
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&RoLRM
553
2,105
0
28 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningInternational Conference on Learning Representations (ICLR), 2023
Tri Dao
LRM
401
2,032
0
17 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language ModelsConference on Robot Learning (CoRL), 2023
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
383
732
0
12 Jul 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the WorldInternational Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
369
1,010
0
26 Jun 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLMLRM
679
1,226
0
17 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
444
2,662
0
20 Apr 2023
Visual Instruction Tuning
Visual Instruction TuningNeural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
1.1K
7,256
0
17 Apr 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with
  Text
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with TextNeural Information Processing Systems (NeurIPS), 2023
Wanrong Zhu
Jack Hessel
Anas Awadalla
S. Gadre
Jesse Dodge
Alex Fang
Youngjae Yu
Ludwig Schmidt
William Yang Wang
Yejin Choi
VLM
457
217
0
14 Apr 2023
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Lukas Höllein
Ang Cao
Andrew Owens
Justin Johnson
Matthias Nießner
DiffM
458
232
0
21 Mar 2023
12
Next