47
0

Generative Perception of Shape and Material from Differential Motion

Main:9 Pages
12 Figures
Bibliography:4 Pages
3 Tables
Appendix:3 Pages
Abstract

Perceiving the shape and material of an object from a single image is inherently ambiguous, especially when lighting is unknown and unconstrained. Despite this, humans can often disentangle shape and material, and when they are uncertain, they often move their head slightly or rotate the object to help resolve the ambiguities. Inspired by this behavior, we introduce a novel conditional denoising-diffusion model that generates samples of shape-and-material maps from a short video of an object undergoing differential motions. Our parameter-efficient architecture allows training directly in pixel-space, and it generates many disentangled attributes of an object simultaneously. Trained on a modest number of synthetic object-motion videos with supervision on shape and material, the model exhibits compelling emergent behavior: For static observations, it produces diverse, multimodal predictions of plausible shape-and-material maps that capture the inherent ambiguities; and when objects move, the distributions quickly converge to more accurate explanations. The model also produces high-quality shape-and-material estimates for less ambiguous, real-world objects. By moving beyond single-view to continuous motion observations, our work suggests a generative perception approach for improving visual reasoning in physically-embodied systems.

View on arXiv
@article{han2025_2506.02473,
  title={ Generative Perception of Shape and Material from Differential Motion },
  author={ Xinran Nicole Han and Ko Nishino and Todd Zickler },
  journal={arXiv preprint arXiv:2506.02473},
  year={ 2025 }
}
Comments on this paper