ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.10688
  4. Cited By
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds

Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds

International Joint Conference on Artificial Intelligence (IJCAI), 2022
22 April 2022
Heng Wang
Chaoyi Zhang
Jianhui Yu
Weidong (Tom) Cai
    3DPC
ArXiv (abs)PDFHTML

Papers citing "Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds"

32 / 32 papers shown
Title
MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding
MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding
Yu Li
Yuenan Hou
Yingmei Wei
X. Zhu
Y. Ma
Wenqi Shao
Yanming Guo
MoE
36
0
0
27 Nov 2025
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding
Yutao Tang
Cheng Zhao
Gaurav Mittal
Rohith Kukkala
Rama Chellappa
Cheng-Fang Peng
Mei Chen
VLM
112
0
0
26 Nov 2025
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Ting Huang
Zeyu Zhang
Hao Tang
LRM
94
9
0
31 Jul 2025
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way
Rajarshi Roy
Devleena Das
A. Banerjee
Arjya Bhattacharjee
Kousik Dasgupta
Subarna Tripathi
VLM
212
0
0
11 Jul 2025
DC-Scene: Data-Centric Learning for 3D Scene Understanding
DC-Scene: Data-Centric Learning for 3D Scene Understanding
Ting Huang
Zeyu Zhang
Ruicheng Zhang
Yang Zhao
199
6
0
21 May 2025
3D CoCa: Contrastive Learners are 3D Captioners
3D CoCa: Contrastive Learners are 3D Captioners
Ting Huang
Zhenru Zhang
Longji Xu
Hao Tang
252
6
0
13 Apr 2025
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth
Dávid Rozenberszki
Angela Dai
260
3
0
21 Mar 2025
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
597
114
0
26 Sep 2024
OE3DIS: Open-Ended 3D Point Cloud Instance Segmentation
OE3DIS: Open-Ended 3D Point Cloud Instance Segmentation
P. Nguyen
Minh Luu
Anh Tran
Cuong Pham
Khoi Duc Minh Nguyen
3DPC
250
1
0
21 Aug 2024
See It All: Contextualized Late Aggregation for 3D Dense Captioning
See It All: Contextualized Late Aggregation for 3D Dense CaptioningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Minjung Kim
Hyung Suk Lim
Seung Hwan Kim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
182
4
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Bi-directional Contextual Attention for 3D Dense CaptioningEuropean Conference on Computer Vision (ECCV), 2024
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
189
5
0
13 Aug 2024
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
Chenming Zhu
Tai Wang
Wenwei Zhang
Kai Chen
Xihui Liu
ReLMLRM
249
49
0
01 Jul 2024
Grounded 3D-LLM with Referent Tokens
Grounded 3D-LLM with Referent Tokens
Yilun Chen
Shuai Yang
Haifeng Huang
Tai Wang
Ruiyuan Lyu
Runsen Xu
Dahua Lin
Jiangmiao Pang
304
75
0
16 May 2024
"Where am I?" Scene Retrieval with Language
"Where am I?" Scene Retrieval with Language
Jiaqi Chen
Dániel Baráth
Iro Armeni
Marc Pollefeys
Hermann Blum
LM&Ro
193
12
0
22 Apr 2024
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework
  through Prompt-based Localization
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization
Yongdong Luo
Haojia Lin
Xiawu Zheng
Yigeng Jiang
Jiayi Ji
Jie Hu
Guannan Jiang
Songan Zhang
Rongrong Ji
168
0
0
17 Apr 2024
On the Robustness of Language Guidance for Low-Level Vision Tasks:
  Findings from Depth Estimation
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee
Tejas Gokhale
Chitta Baral
Yezhou Yang
VLM
136
4
0
12 Apr 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin
Yupeng Zheng
Pengfei Li
Weize Li
Yuhang Zheng
...
Kun Zhan
Fu Liu
Xiaoxiao Long
Yilun Chen
Hao Zhao
3DV
244
37
0
28 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
204
16
0
12 Mar 2024
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
  Reasoning, and Planning
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningComputer Vision and Pattern Recognition (CVPR), 2023
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
MLLM
277
165
0
30 Nov 2023
CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale
  Point Cloud Data
CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud DataNeural Information Processing Systems (NeurIPS), 2023
Taiki Miyanishi
Fumiya Kitamori
Shuhei Kurita
Jungdae Lee
M. Kawanabe
Nakamasa Inoue
AI4TS3DPC
178
14
0
28 Oct 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive
  Survey and Evaluation
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
235
17
0
24 Oct 2023
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End
  3D Dense Captioning
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sijin Chen
Erik Cambria
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
268
39
0
06 Sep 2023
Dense Object Grounding in 3D Scenes
Dense Object Grounding in 3D ScenesACM Multimedia (ACM MM), 2023
Wencan Huang
Daizong Liu
Wei Hu
218
24
0
05 Sep 2023
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D ScansInternational Conference on 3D Vision (3DV), 2023
Taiki Miyanishi
Daich Azuma
Shuhei Kurita
M. Kawanabe
218
10
0
23 May 2023
Vision-Language Pre-training with Object Contrastive Learning for 3D
  Scene Understanding
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023
Zhang Tao
Su He
D. Tao
Bin Chen
Zhi Wang
Shutao Xia
VLM
153
38
0
18 May 2023
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D
  Scenes
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes
Maria Parelli
Alexandros Delitzas
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
183
71
0
12 Apr 2023
End-to-End 3D Dense Captioning with Vote2Cap-DETR
End-to-End 3D Dense Captioning with Vote2Cap-DETRComputer Vision and Pattern Recognition (CVPR), 2023
Sijin Chen
Erik Cambria
Xin Chen
Yinjie Lei
Tao Chen
YU Gang
ViT
190
82
0
06 Jan 2023
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved
  Visio-Linguistic Models in 3D Scenes
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D ScenesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
231
32
0
12 Dec 2022
Language-Assisted 3D Feature Learning for Semantic Scene Understanding
Language-Assisted 3D Feature Learning for Semantic Scene UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2022
Junbo Zhang
Guo Fan
Guanghan Wang
Zhèngyuān Sū
Kaisheng Ma
L. Yi
3DPC
231
8
0
25 Nov 2022
Novel 3D Scene Understanding Applications From Recurrence in a Single
  Image
Novel 3D Scene Understanding Applications From Recurrence in a Single Image
Shimian Zhang
Skanda Bharadwaj
Keaton Kraiger
Yashasvi Asthana
Kuanqi Cai
R. Collins
Yanxi Liu
297
2
0
14 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
154
17
0
08 Oct 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual
  Grounding
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2022
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
258
106
0
29 Sep 2022
1