PointCLIP: Point Cloud Understanding by CLIP

Computer Vision and Pattern Recognition (CVPR), 2021

4 December 2021

Ziyu Guo

Yu Qiao

ArXiv (abs)PDF HTML Github (371★)

Papers citing "PointCLIP: Point Cloud Understanding by CLIP"

50 / 223 papers shown

ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding

176

03 Dec 2025

Multimodal Robust Prompt Distillation for 3D Point Cloud Models

273

26 Nov 2025

LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight

148

25 Nov 2025

CrossJEPA: Cross-Modal Joint-Embedding Predictive Architecture for Efficient 3D Representation Learning from 2D Images

218

23 Nov 2025

Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift

109

21 Nov 2025

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

101

17 Nov 2025

Point Cloud Quantization through Multimodal Prompting for 3D Understanding

472

15 Nov 2025

A Systematic Study of Model Extraction Attacks on Graph Foundation Models

...

142

14 Nov 2025

PointCubeNet: 3D Part-level Reasoning with 3x3x3 Point Cloud Blocks

Da-Yeong Kim

Yeong-Jun Cho

3DPC 3DV

193

10 Nov 2025

CSGaze: Context-aware Social Gaze Prediction

Surbhi Madan

Shreya Ghosh

Ramanathan Subramanian

Abhinav Dhall

Tom Gedeon

159

08 Nov 2025

Open-World 3D Scene Graph Generation for Retrieval-Augmented Reasoning

299

08 Nov 2025

How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?

250

07 Nov 2025

BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining

159

21 Oct 2025

Towards 3D Objectness Learning in an Open World

192

20 Oct 2025

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation

218

10 Oct 2025

PIT-QMM: A Large Multimodal Model For No-Reference Point Cloud Quality AssessmentInternational Conference on Information Photonics (ICIP), 2025

Shashank Gupta

Gregoire Phillips

Alan Bovik

101

09 Oct 2025

MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation

139

05 Oct 2025

SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment

Hongyang Zhang

Yinhao Liu

Zhenyu Kuang

191

29 Sep 2025

GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing

178

17 Sep 2025

OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds

228

13 Sep 2025

^3

Afford: One-Shot 3D Object-to-Object Affordance Grounding for Generalizable Robotic Manipulation

Tongxuan Tian

Xuhui Kang

Yen-Ling Kuo

137

07 Sep 2025

PointAD+: Learning Hierarchical Representations for Zero-shot 3D Anomaly Detection

265

03 Sep 2025

OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations

124

27 Aug 2025

TinyGiantVLM: A Lightweight Vision-Language Architecture for Spatial Reasoning under Resource Constraints

25 Aug 2025

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training

205

12 Aug 2025

Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth CompletionIEEE Transactions on Image Processing (IEEE TIP), 2025

134

07 Aug 2025

Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval

169

29 Jul 2025

SmartCLIP: Modular Vision-language Alignment with Identification GuaranteesComputer Vision and Pattern Recognition (CVPR), 2025

254

29 Jul 2025

BANG: Dividing 3D Assets via Generative Exploded DynamicsACM Transactions on Graphics (TOG), 2025

228

29 Jul 2025

Multi-modal Multi-task Pre-training for Improved Point Cloud Understanding

204

23 Jul 2025

Principled Multimodal Representation Learning

257

23 Jul 2025

TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP

295

20 Jul 2025

Stereo-based 3D Anomaly Object Detection for Autonomous Driving: A New Dataset and Baseline

212

12 Jul 2025

PointVDP: Learning View-Dependent Projection by Fireworks Rays for 3D Point Cloud Segmentation

252

09 Jul 2025

Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature AlignmentIEEE Transactions on Image Processing (IEEE TIP), 2025

267

01 Jul 2025

MR-COSMO: Visual-Text Memory Recall and Direct CrOSs-MOdal Alignment Method for Query-Driven 3D Segmentation

254

26 Jun 2025

TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast

291

16 Jun 2025

EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning

226

14 Jun 2025

AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making

329

14 Jun 2025

3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

222

11 Jun 2025

Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs

398

05 Jun 2025

FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution

Xiaoyi Liu

Hao Tang

AI4CE

278

29 May 2025

HuMoCon: Concept Discovery for Human Motion UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

244

27 May 2025

SVL: Spike-based Vision-language Pretraining for Efficient 3D Open-world Understanding

414

23 May 2025

RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation

Naman Patel

Prashanth Krishnamurthy

Farshad Khorrami

308

21 May 2025

Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation LearningIEEE Access (IEEE Access), 2025

529

30 Apr 2025

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Krishna Murthy Jatavallabhula

...

289

19 Apr 2025

Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying

313

27 Mar 2025

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

...

670

23 Mar 2025

Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

1.1K

20 Mar 2025