ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.07685
91
4

Matrix3D: Large Photogrammetry Model All-in-One

11 February 2025
Yuanxun Lu
Jingyang Zhang
Tian Fang
Jean-Daniel Nahmias
Yanghai Tsin
Long Quan
Xun Cao
Yao Yao
Shiwei Li
ArXivPDFHTML
Abstract

We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D's large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs, thus significantly increases the pool of available training data. Matrix3D demonstrates state-of-the-art performance in pose estimation and novel view synthesis tasks. Additionally, it offers fine-grained control through multi-round interactions, making it an innovative tool for 3D content creation. Project page:this https URL.

View on arXiv
@article{lu2025_2502.07685,
  title={ Matrix3D: Large Photogrammetry Model All-in-One },
  author={ Yuanxun Lu and Jingyang Zhang and Tian Fang and Jean-Daniel Nahmias and Yanghai Tsin and Long Quan and Xun Cao and Yao Yao and Shiwei Li },
  journal={arXiv preprint arXiv:2502.07685},
  year={ 2025 }
}
Comments on this paper