Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.04138
Cited By
The 3D-PC: a benchmark for visual perspective taking in humans and machines
6 June 2024
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The 3D-PC: a benchmark for visual perspective taking in humans and machines"
14 / 14 papers shown
Title
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
Gracjan Góral
Alicja Ziarko
Piotr Miłoś
Michał Nauman
Maciej Wołczyk
Michał Kosiński
LRM
12
0
0
03 May 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas J. Guibas
Minhyuk Sung
LRM
31
0
0
24 Apr 2025
The Philosophical Foundations of Growing AI Like A Child
Dezhi Luo
Yijiang Li
Hokin Deng
ReLM
LRM
39
1
0
15 Feb 2025
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen
Xingyu Chen
Anpei Chen
Gerard Pons-Moll
Yuliang Xiu
3DGS
71
2
0
12 Dec 2024
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
Gracjan Góral
Alicja Ziarko
Michal Nauman
Maciej Wołczyk
LRM
23
1
0
02 Sep 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
130
681
0
19 Jan 2024
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
Saurabh Saxena
Junhwa Hur
Charles Herrmann
Deqing Sun
David J. Fleet
DiffM
29
24
0
20 Dec 2023
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
7,337
0
11 Nov 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
178
1,148
0
05 Oct 2021
PP-LCNet: A Lightweight CPU Convolutional Neural Network
Cheng Cui
Tingquan Gao
Shengyun Wei
Yuning Du
Ruoyu Guo
...
X. Lv
Qiwen Liu
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
ObjD
26
111
0
17 Sep 2021
Visformer: The Vision-friendly Transformer
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
106
206
0
26 Apr 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
154
676
0
22 Apr 2021
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
261
10,106
0
16 Nov 2016
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
L. V. D. van der Maaten
Kilian Q. Weinberger
PINN
3DV
236
35,884
0
25 Aug 2016
1