ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.11514
  4. Cited By
Semantic Abstraction: Open-World 3D Scene Understanding from 2D
  Vision-Language Models

Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models

23 July 2022
Huy Ha
Shuran Song
    LM&Ro
    VLM
ArXivPDFHTML

Papers citing "Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models"

35 / 85 papers shown
Title
Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene
  Representation
Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Kashu Yamazaki
Taisei Hanyu
Khoa T. Vo
Thang M. Pham
Minh-Triet Tran
Gianfranco Doretto
Anh Nguyen
Ngan Le
16
25
0
05 Oct 2023
Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time
  Visual Scene Understanding
Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding
Christina Kassab
Matías Mattamala
Lintong Zhang
Maurice F. Fallon
20
18
0
26 Sep 2023
Unsupervised 3D Perception with 2D Vision-Language Distillation for
  Autonomous Driving
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving
Mahyar Najibi
Jingwei Ji
Yin Zhou
C. Qi
Xinchen Yan
Scott Ettinger
Drago Anguelov
14
27
0
25 Sep 2023
3D Indoor Instance Segmentation in an Open-World
3D Indoor Instance Segmentation in an Open-World
Mohamed El Amine Boudjoghra
Salwa K. Al Khatib
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Fahad Khan
3DV
ISeg
9
6
0
25 Sep 2023
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language
  Model as an Agent
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
Jianing Yang
Xuweiyi Chen
Shengyi Qian
Nikhil Madaan
Madhavan Iyengar
David Fouhey
Joyce Chai
LM&Ro
LLMAG
22
84
0
21 Sep 2023
Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping
Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping
Adam Rashid
Satvik Sharma
C. Kim
J. Kerr
L. Chen
Angjoo Kanazawa
Ken Goldberg
42
83
0
14 Sep 2023
Gesture-Informed Robot Assistance via Foundation Models
Gesture-Informed Robot Assistance via Foundation Models
Li-Heng Lin
Yuchen Cui
Yilun Hao
Fei Xia
Dorsa Sadigh
LM&Ro
SLR
11
19
0
06 Sep 2023
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
Ziyu Zhu
Xiaojian Ma
Yixin Chen
Zhidong Deng
Siyuan Huang
Qing Li
LM&Ro
23
100
0
08 Aug 2023
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Ayca Takmaz
Elisabetta Fedele
R. Sumner
Marc Pollefeys
F. Tombari
Francis Engelmann
ISeg
VLM
9
163
0
23 Jun 2023
CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities
CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities
A. Agrawal
Raghav Arora
Ahana Datta
Snehasis Banerjee
Brojeshwar Bhowmick
Krishna Murthy Jatavallabhula
Mohan Sridharan
Madhava Krishna
15
2
0
02 Jun 2023
GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds
GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds
Zihui Zhang
Bo Yang
Bing Wang
Bo Li
3DPC
15
40
0
25 May 2023
OVO: Open-Vocabulary Occupancy
OVO: Open-Vocabulary Occupancy
Zhiyu Tan
Zichao Dong
Cheng-Jun Zhang
Weikun Zhang
Hang Ji
Hao Li
VLM
11
14
0
25 May 2023
Weakly Supervised 3D Open-vocabulary Segmentation
Weakly Supervised 3D Open-vocabulary Segmentation
Kunhao Liu
Fangneng Zhan
Jiahui Zhang
Muyu Xu
Yingchen Yu
Abdulmotaleb El Saddik
Christian Theobalt
Eric P. Xing
Shijian Lu
16
66
0
23 May 2023
OpenShape: Scaling Up 3D Shape Representation Towards Open-World
  Understanding
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding
Minghua Liu
Ruoxi Shi
Kaiming Kuang
Yinhao Zhu
Xuanlin Li
Shizhong Han
H. Cai
Fatih Porikli
Hao Su
3DPC
22
115
0
18 May 2023
Foundations of Spatial Perception for Robotics: Hierarchical
  Representations and Real-time Systems
Foundations of Spatial Perception for Robotics: Hierarchical Representations and Real-time Systems
Nathan Hughes
Yun Chang
Siyi Hu
Rajat Talak
Rumaisa Abdulhai
Jared Strader
Luca Carlone
13
45
0
11 May 2023
Segment Anything in 3D with Radiance Fields
Segment Anything in 3D with Radiance Fields
Jiazhong Cen
Jiemin Fang
Zanwei Zhou
Chen Yang
Lingxi Xie
Xiaopeng Zhang
Wei-Ming Shen
Qi Tian
27
43
0
24 Apr 2023
Grounding Classical Task Planners via Vision-Language Models
Grounding Classical Task Planners via Vision-Language Models
Xiaohan Zhang
Yan Ding
S. Amiri
Hao Yang
Andy Kaminski
Chad Esselink
Shiqi Zhang
8
15
0
17 Apr 2023
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior
  Refinement
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
Xiang-yu Zhu
Renrui Zhang
Bowei He
A-Long Zhou
Dong Wang
Bingyan Zhao
Peng Gao
VLM
27
76
0
03 Apr 2023
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation
  Models
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models
Jianglong Ye
Naiyan Wang
X. Wang
DiffM
35
41
0
22 Mar 2023
CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
  Point Cloud Data
CLIP2^22: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data
Yi Zeng
Chenhan Jiang
Jiageng Mao
Jianhua Han
Chao Ye
Qingqiu Huang
Dit-Yan Yeung
Zhen Yang
Xiaodan Liang
Hang Xu
3DPC
VLM
CLIP
12
68
0
22 Mar 2023
Neural Implicit Vision-Language Feature Fields
Neural Implicit Vision-Language Feature Fields
Kenneth Blomqvist
Francesco Milano
Jen Jen Chung
Lionel Ott
Roland Siegwart
VLM
12
12
0
20 Mar 2023
LERF: Language Embedded Radiance Fields
LERF: Language Embedded Radiance Fields
J. Kerr
C. Kim
Ken Goldberg
Angjoo Kanazawa
Matthew Tancik
13
348
0
16 Mar 2023
CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D
  Dense CLIP
CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP
Junbo Zhang
Runpei Dong
Kaisheng Ma
CLIP
VLM
11
77
0
08 Mar 2023
Semantic Mechanical Search with Large Vision and Language Models
Semantic Mechanical Search with Large Vision and Language Models
Satvik Sharma
Huang Huang
K. Shivakumar
A. Imran
Ryan Hoque
Brian Ichter
Ken Goldberg
LM&Ro
VLM
6
5
0
24 Feb 2023
ConceptFusion: Open-set Multimodal 3D Mapping
ConceptFusion: Open-set Multimodal 3D Mapping
Krishna Murthy Jatavallabhula
Ali Kuwajerwala
Qiao Gu
Mohd. Omama
Tao Chen
...
Celso Miguel de Melo
Madhava Krishna
Liam Paull
Florian Shkurti
Antonio Torralba
11
230
0
14 Feb 2023
Task Bias in Vision-Language Models
Task Bias in Vision-Language Models
Sachit Menon
I. Chandratreya
Carl Vondrick
VLM
SSL
12
6
0
08 Dec 2022
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual
  Grounding
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Ronghang Hu
Xinlei Chen
Matthias Nießner
Angel X. Chang
17
52
0
01 Dec 2022
OpenScene: 3D Scene Understanding with Open Vocabularies
OpenScene: 3D Scene Understanding with Open Vocabularies
Songyou Peng
Kyle Genova
ChiyuMaxJiang
Andrea Tagliasacchi
Marc Pollefeys
Thomas Funkhouser
3DPC
VLM
15
341
0
28 Nov 2022
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
Nur Muhammad (Mahi) Shafiullah
Chris Paxton
Lerrel Pinto
Soumith Chintala
Arthur Szlam
VLM
LM&Ro
CLIP
90
155
0
11 Oct 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual
  Grounding
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
48
62
0
29 Sep 2022
Decomposing NeRF for Editing via Feature Field Distillation
Decomposing NeRF for Editing via Feature Field Distillation
Sosuke Kobayashi
Eiichi Matsumoto
Vincent Sitzmann
165
326
0
31 May 2022
Rethinking Attention-Model Explainability through Faithfulness Violation
  Test
Rethinking Attention-Model Explainability through Faithfulness Violation Test
Y. Liu
Haoliang Li
Yangyang Guo
Chen Kong
Jing Li
Shiqi Wang
FAtt
116
41
0
28 Jan 2022
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
149
360
0
17 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
185
403
0
13 Jul 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Previous
12