ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.01128
  4. Cited By
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding
  on Point Clouds through Instance Multi-level Contextual Referring
v1v2 (latest)

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

IEEE International Conference on Computer Vision (ICCV), 2021
1 March 2021
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
ArXiv (abs)PDFHTML

Papers citing "InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring"

50 / 91 papers shown
Unified Representation Space for 3D Visual Grounding
Unified Representation Space for 3D Visual Grounding
Yinuo Zheng
Lipeng Gu
Honghua Chen
Liangliang Nan
Mingqiang Wei
259
0
0
17 Jun 2025
FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding
FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding
Chenlu Zhan
Yufei Zhang
Gaoang Wang
Hongwei Wang
3DV
316
3
0
16 Jun 2025
Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding
Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding
Yani Zhang
Dongming Wu
Hao Shi
Yingfei Liu
Tiancai Wang
Haoqiang Fan
Xingping Dong
ObjD
548
3
0
05 Jun 2025
Zero-Shot 3D Visual Grounding from Vision-Language Models
Zero-Shot 3D Visual Grounding from Vision-Language Models
Rong Li
Shijie Li
Lingdong Kong
Xulei Yang
Junwei Liang
VGen
342
3
0
28 May 2025
LSVG: Language-Guided Scene Graphs with 2D-Assisted Multi-Modal Encoding for 3D Visual Grounding
LSVG: Language-Guided Scene Graphs with 2D-Assisted Multi-Modal Encoding for 3D Visual Grounding
Feng Xiao
Hongbin Xu
Guocan Zhao
Wenxiong Kang
646
0
0
07 May 2025
3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment
3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level AlignmentIEEE International Conference on Robotics and Automation (ICRA), 2025
Xianrui Li
Jing Liu
Nuowei Han
Liang Heng
Yike Guo
Hao Dong
Yang Liu
280
2
0
03 May 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
313
19
0
19 Apr 2025
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and ReasoningComputer Vision and Pattern Recognition (CVPR), 2025
Zhenyang Liu
Yikai Wang
Sixiao Zheng
Tongying Pan
Longfei Liang
Yanwei Fu
Xiangyang Xue
LRM
222
19
0
30 Mar 2025
Empowering Large Language Models with 3D Situation Awareness
Empowering Large Language Models with 3D Situation AwarenessComputer Vision and Pattern Recognition (CVPR), 2025
Zhihao Yuan
Yibo Peng
Jinke Ren
Yinghong Liao
Yatong Han
Chun-Mei Feng
Hengshuang Zhao
G. Li
Shuguang Cui
Ge Wang
441
5
0
29 Mar 2025
Vehicle-Scene Interaction: A Text-Driven 3D Lidar Place Recognition Method for Autonomous Driving
Vehicle-Scene Interaction: A Text-Driven 3D Lidar Place Recognition Method for Autonomous Driving
Tianyi Shang
Zhenyu Li
Pengjie Xu
ZhaoJun Deng
387
0
0
23 Mar 2025
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based ReferringAAAI Conference on Artificial Intelligence (AAAI), 2025
Xinyi Wang
Na Zhao
Zhiyuan Han
Dan Guo
Xun Yang
309
10
0
17 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Yuan Liu
Hengshuang Zhao
863
76
0
02 Jan 2025
LidaRefer: Context-aware Outdoor 3D Visual Grounding for Autonomous Driving
LidaRefer: Context-aware Outdoor 3D Visual Grounding for Autonomous Driving
Yeong-Seung Baek
Heung-Seon Oh
377
0
0
07 Nov 2024
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed
  Spatial Attention
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial AttentionNeural Information Processing Systems (NeurIPS), 2024
Haomeng Zhang
Chiao-An Yang
Raymond A. Yeh
364
7
0
29 Oct 2024
Joint Top-Down and Bottom-Up Frameworks for 3D Visual Grounding
Joint Top-Down and Bottom-Up Frameworks for 3D Visual GroundingInternational Conference on Pattern Recognition (ICPR), 2024
Yang Liu
Daizong Liu
Wei Hu
3DPC
440
9
0
21 Oct 2024
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual GroundingConference on Robot Learning (CoRL), 2024
Runsen Xu
Zhiwei Huang
Tai Wang
Yuxiao Chen
Jiangmiao Pang
Dahua Lin
VGen
301
48
0
17 Oct 2024
LESS: Label-Efficient and Single-Stage Referring 3D Segmentation
LESS: Label-Efficient and Single-Stage Referring 3D SegmentationNeural Information Processing Systems (NeurIPS), 2024
Xuexun Liu
Xiaoxu Xu
Jinlong Li
Qiudan Zhang
Xu Wang
Andrii Zadaianchuk
Lin Ma
479
4
0
17 Oct 2024
Grounding 3D Scene Affordance From Egocentric Interactions
Grounding 3D Scene Affordance From Egocentric Interactions
Cuiyu Liu
Wei Zhai
Yuhang Yang
Hongchen Luo
Sen Liang
Yang Cao
Zheng-Jun Zha
427
10
0
29 Sep 2024
Bayesian Self-Training for Semi-Supervised 3D Segmentation
Bayesian Self-Training for Semi-Supervised 3D SegmentationEuropean Conference on Computer Vision (ECCV), 2024
Ozan Unal
Daniel Gehrig
Luc Van Gool
3DPC3DV
260
1
0
12 Sep 2024
R2G: Reasoning to Ground in 3D Scenes
R2G: Reasoning to Ground in 3D ScenesPattern Recognition (Pattern Recogn.), 2024
Yixuan Li
Zan Wang
Wei Liang
418
4
0
24 Aug 2024
3D-GRES: Generalized 3D Referring Expression Segmentation
3D-GRES: Generalized 3D Referring Expression Segmentation
Changli Wu
Yihang Liu
Jiayi Ji
Yiwei Ma
Haowei Wang
Gen Luo
Henghui Ding
Xiaoshuai Sun
Rongrong Ji
336
19
0
30 Jul 2024
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
Shuting He
Henghui Ding
316
26
0
25 Jul 2024
Multi-branch Collaborative Learning Network for 3D Visual Grounding
Multi-branch Collaborative Learning Network for 3D Visual Grounding
Zhipeng Qian
Yiwei Ma
Zhekai Lin
Jinfa Huang
Xiawu Zheng
Xiaoshuai Sun
Rongrong Ji
3DV
370
25
0
07 Jul 2024
Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding
Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding
Yue Xu
Kaizhi Yang
Jiebo Luo
Xuejin Chen
3DPC
359
2
0
13 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances,
  and Future Directions
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future DirectionsIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
434
36
0
09 Jun 2024
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Kuan-Chih Huang
Xiangtai Li
Lu Qi
Shuicheng Yan
Ming-Hsuan Yang
LRM
476
25
0
27 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Brandon Smart
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
462
37
0
16 May 2024
Generating Human Motion in 3D Scenes from Text Descriptions
Generating Human Motion in 3D Scenes from Text DescriptionsComputer Vision and Pattern Recognition (CVPR), 2024
Zhi Cen
Huaijin Pi
Sida Peng
Zehong Shen
Minghui Yang
Shuai Zhu
Hujun Bao
Xiaowei Zhou
302
53
0
13 May 2024
Naturally Supervised 3D Visual Grounding with Language-Regularized
  Concept Learners
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng
Joy Hsu
Weiyu Liu
Jiajun Wu
PINNLRM
335
10
0
30 Apr 2024
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework
  through Prompt-based Localization
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization
Yongdong Luo
Haojia Lin
Xiawu Zheng
Yigeng Jiang
Jiayi Ji
Jie Hu
Guannan Jiang
Songan Zhang
Rongrong Ji
289
0
0
17 Apr 2024
PointCloud-Text Matching: Benchmark Datasets and a Baseline
PointCloud-Text Matching: Benchmark Datasets and a Baseline
Yanglin Feng
Yang Qin
Dezhong Peng
Erik Cambria
Xi Peng
Peng Hu
390
0
0
28 Mar 2024
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu
Sheng-Yu Huang
Yu-Chiang Frank Wang
794
4
0
25 Mar 2024
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph
  Attention
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention
Feng Xiao
Hongbin Xu
Qiuxia Wu
Wenxiong Kang
311
4
0
13 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
254
19
0
12 Mar 2024
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual
  Grounding
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang
Shaoxiang Wang
A. Pagani
Didier Stricker
423
30
0
05 Mar 2024
Adversarial Testing for Visual Grounding via Image-Aware Property
  Reduction
Adversarial Testing for Visual Grounding via Image-Aware Property Reduction
Zhiyuan Chang
Mingyang Li
Peng Li
Cheng Li
Boyu Wu
Fanjiang Xu
Qing Wang
AAML
286
1
0
02 Mar 2024
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
384
37
0
17 Dec 2023
Weakly-Supervised 3D Visual Grounding based on Visual Language Alignment
Weakly-Supervised 3D Visual Grounding based on Visual Language AlignmentIEEE transactions on multimedia (IEEE TMM), 2023
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
634
5
0
15 Dec 2023
Mono3DVG: 3D Visual Grounding in Monocular Images
Mono3DVG: 3D Visual Grounding in Monocular ImagesAAAI Conference on Artificial Intelligence (AAAI), 2023
Yangfan Zhan
Yuan. Yuan
Zhitong Xiong
MDE
298
38
0
13 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
393
9
0
05 Dec 2023
Text2Loc: 3D Point Cloud Localization from Natural Language
Text2Loc: 3D Point Cloud Localization from Natural LanguageComputer Vision and Pattern Recognition (CVPR), 2023
Yan Xia
Letian Shi
Zifeng Ding
João F. Henriques
Zorah Lähner
434
64
0
27 Nov 2023
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Visual Programming for Zero-shot Open-Vocabulary 3D Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2023
Zhihao Yuan
Jinke Ren
Chun-Mei Feng
Hengshuang Zhao
Shuguang Cui
Zhen Li
399
76
0
26 Nov 2023
CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale
  Point Cloud Data
CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud DataNeural Information Processing Systems (NeurIPS), 2023
Taiki Miyanishi
Fumiya Kitamori
Shuhei Kurita
Jungdae Lee
M. Kawanabe
Nakamasa Inoue
AI4TS3DPC
292
20
0
28 Oct 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive
  Survey and Evaluation
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
326
17
0
24 Oct 2023
CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding
CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual GroundingInternational Conference on Learning Representations (ICLR), 2023
Eslam Mohamed Bakr
Mohamed Ayman
Mahmoud Ahmed
Habib Slim
Mohamed Elhoseiny
LRM
460
16
0
10 Oct 2023
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language
  Model as an Agent
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an AgentIEEE International Conference on Robotics and Automation (ICRA), 2023
Jianing Yang
Xuweiyi Chen
Shengyi Qian
Nikhil Madaan
Madhavan Iyengar
David Fouhey
Joyce Chai
LM&RoLLMAG
405
162
0
21 Sep 2023
Object2Scene: Putting Objects in Context for Open-Vocabulary 3D
  Detection
Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection
Chenming Zhu
Wenwei Zhang
Tai Wang
Xihui Liu
Kai-xiang Chen
3DPC
286
28
0
18 Sep 2023
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
Multi3DRefer: Grounding Text Description to Multiple 3D ObjectsIEEE International Conference on Computer Vision (ICCV), 2023
Yiming Zhang
ZeMing Gong
Angel X. Chang
559
155
0
11 Sep 2023
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual GroundingEuropean Conference on Computer Vision (ECCV), 2023
Ozan Unal
Daniel Gehrig
Suman Saha
Luc Van Gool
364
35
0
08 Sep 2023
Dense Object Grounding in 3D Scenes
Dense Object Grounding in 3D ScenesACM Multimedia (ACM MM), 2023
Wencan Huang
Daizong Liu
Wei Hu
287
26
0
05 Sep 2023
12
Next
Page 1 of 2