ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.01551
  4. Cited By
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning
  and Visual Grounding

D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding

2 December 2021
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
ArXivPDFHTML

Papers citing "D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding"

27 / 27 papers shown
Title
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
Feng Xiao
Hongbin Xu
Guocan Zhao
Wenxiong Kang
37
0
0
07 May 2025
Multi-Object Grounding via Hierarchical Contrastive Siamese Transformers
Multi-Object Grounding via Hierarchical Contrastive Siamese Transformers
Chengyi Du
Keyan Jin
19
0
0
14 Apr 2025
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth
Dávid Rozenberszki
Angela Dai
71
0
0
21 Mar 2025
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed
  Spatial Attention
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention
Haomeng Zhang
Chiao-An Yang
Raymond A. Yeh
29
1
0
29 Oct 2024
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Runsen Xu
Zhiwei Huang
Tai Wang
Y. Chen
Jiangmiao Pang
Dahua Lin
VGen
34
0
0
17 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
87
29
0
26 Sep 2024
See It All: Contextualized Late Aggregation for 3D Dense Captioning
See It All: Contextualized Late Aggregation for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Seung Hwan Kim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
39
0
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
29
0
0
13 Aug 2024
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
Shuting He
Henghui Ding
44
10
0
25 Jul 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances,
  and Future Directions
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
26
9
0
09 Jun 2024
Collaborative Novel Object Discovery and Box-Guided Cross-Modal
  Alignment for Open-Vocabulary 3D Object Detection
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
28
6
0
02 Jun 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks
  via Multi-modal Large Language Models
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
29
11
0
16 May 2024
Transcrib3D: 3D Referring Expression Resolution through Large Language
  Models
Transcrib3D: 3D Referring Expression Resolution through Large Language Models
Jiading Fang
Xiangshan Tan
Shengjie Lin
Igor Vasiljevic
Vitor Campagnolo Guizilini
Hongyuan Mei
Rares Ambrus
Gregory Shakhnarovich
Matthew R. Walter
LM&Ro
30
4
0
30 Apr 2024
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph
  Attention
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention
Feng Xiao
Hongbin Xu
Qiuxia Wu
Wenxiong Kang
22
2
0
13 Mar 2024
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Hongyuan Zhu
Fukun Yin
Gang Yu
Tao Chen
17
23
0
17 Dec 2023
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
47
4
0
15 Dec 2023
Mono3DVG: 3D Visual Grounding in Monocular Images
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan
Yuan. Yuan
Zhitong Xiong
MDE
23
9
0
13 Dec 2023
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion
  Priors
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Zhenyu Chen
Haoxuan Li
Hsin-Ying Lee
Sergey Tulyakov
Matthias Nießner
DiffM
14
28
0
28 Nov 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive
  Survey and Evaluation
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
27
8
0
24 Oct 2023
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
Yiming Zhang
ZeMing Gong
Angel X. Chang
42
63
0
11 Sep 2023
Text2Tex: Text-driven Texture Synthesis via Diffusion Models
Text2Tex: Text-driven Texture Synthesis via Diffusion Models
Dave Zhenyu Chen
Yawar Siddiqui
Hsin-Ying Lee
Sergey Tulyakov
Matthias Nießner
DiffM
20
185
0
20 Mar 2023
Towards Explainable 3D Grounded Visual Question Answering: A New
  Benchmark and Strong Baseline
Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline
Lichen Zhao
Daigang Cai
Jing Zhang
Lu Sheng
Dong Xu
Ruizhi Zheng
Yinjie Zhao
Lipeng Wang
Xibo Fan
6
23
0
24 Sep 2022
Federated Learning via Decentralized Dataset Distillation in
  Resource-Constrained Edge Environments
Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments
Rui Song
Dai Liu
Da Chen
Andreas Festag
Carsten Trinitis
Martin Schulz
Alois C. Knoll
DD
FedML
6
59
0
24 Aug 2022
Hierarchical Aggregation for 3D Instance Segmentation
Hierarchical Aggregation for 3D Instance Segmentation
Shaoyu Chen
Jiemin Fang
Qian Zhang
Wenyu Liu
Xinggang Wang
3DPC
47
160
0
05 Aug 2021
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding
  on Point Clouds through Instance Multi-level Contextual Referring
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
59
128
0
01 Mar 2021
Speaker-Follower Models for Vision-and-Language Navigation
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&Ro
LRM
237
444
0
07 Jun 2018
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image
  Captioning
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu
Caiming Xiong
Devi Parikh
R. Socher
83
443
0
06 Dec 2016
1