Probing the 3D Awareness of Visual Foundation Models

12 April 2024

Mohamed El Banani

Amit Raj

Kevis-Kokitsi Maninis

Abhishek Kar

Yuanzhen Li

Michael Rubinstein

Papers citing "Probing the 3D Awareness of Visual Foundation Models"

21 / 21 papers shown

Title
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation Volodymyr Havrylov Haiwen Huang Dan Zhang Andreas Geiger 34 0 0 04 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models Wufei Ma Luoxin Ye Nessa McWeeney Celso M de Melo A. Yuille Jieneng Chen LRM 57 1 0 01 May 2025
Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation Victoria Yue Chen Daoye Wang Stephan Garbin Jan Bednarík Sebastian Winberg Timo Bolkart Thabo Beeler 3DH 3DPC 32 0 0 28 Apr 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding Jinlong Li Cristiano Saltori Fabio Poiesi N. Sebe 61 0 0 20 Mar 2025
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering Yanpeng Zhao Yiwei Hao Siyu Gao Yunbo Wang Xiaokang Yang OCL 111 1 0 17 Feb 2025
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects Jiayi Liu Denys Iliash Angel X. Chang Manolis Savva Ali Mahdavi-Amiri 51 7 0 21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness Kevis-Kokitsi Maninis Kaifeng Chen Soham Ghosh Arjun Karpur Koert Chen ... Jan Dlabal Dan Gnanapragasam Mojtaba Seyedhosseini Howard Zhou Andre Araujo VLM 30 3 0 21 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation Haoyi Zhu Honghui Yang Yating Wang Jiange Yang Limin Wang Tong He 3DH 43 5 0 10 Oct 2024
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Mike Ranzinger Jon Barker Greg Heinrich Pavlo Molchanov Bryan Catanzaro Andrew Tao 25 4 0 02 Oct 2024
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image Dimitrije Antić Sai Kumar Dwivedi Shashank Tripathi Theo Gevers Dimitrios Tzionas Dimitrios Tzionas 42 2 0 24 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Yunze Man Shuhong Zheng Zhipeng Bao M. Hebert Liang-Yan Gui Yu-xiong Wang 70 15 0 05 Sep 2024
Odd-One-Out: Anomaly Detection by Comparing with Neighbors A. Bhunia Changjian Li Hakan Bilen 34 0 0 28 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images Han-Hung Lee Yiming Zhang Angel X. Chang 3DPC 34 3 0 17 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines Drew Linsley Peisen Zhou A. Ashok Akash Nagaraj Gaurav Gaonkar Francis E Lewis Zygmunt Pizlo Thomas Serre 37 6 0 06 Jun 2024
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One Michael Ranzinger Greg Heinrich Jan Kautz Pavlo Molchanov VLM 20 42 0 10 Dec 2023
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Micah Goldblum Hossein Souri Renkun Ni Manli Shu Viraj Prabhu ... Adrien Bardes Judy Hoffman Ramalingam Chellappa Andrew Gordon Wilson Tom Goldstein VLM 68 62 0 30 Oct 2023
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image Wei Yin Chi Zhang Hao Chen Zhipeng Cai Gang Yu Kaixuan Wang Xiaozhi Chen Chunhua Shen MDE 126 169 0 20 Jul 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models Jiarui Xu Sifei Liu Arash Vahdat Wonmin Byeon Xiaolong Wang Shalini De Mello VLM 198 318 0 08 Mar 2023
Muse: Text-To-Image Generation via Masked Generative Transformers Huiwen Chang Han Zhang Jarred Barber AJ Maschinot José Lezama ... Kevin Patrick Murphy William T. Freeman Michael Rubinstein Yuanzhen Li Dilip Krishnan DiffM 197 515 0 02 Jan 2023
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 258 7,337 0 11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 283 5,723 0 29 Apr 2021