TivNe-SLAM: Dynamic Mapping and Tracking via Time-Varying Neural Radiance Fields

Previous attempts to integrate Neural Radiance Fields (NeRF) into the Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or require the ground truth camera poses, which impedes their application in real-world scenarios. In this paper, we propose a time-varying representation to track and reconstruct the dynamic scenes. Firstly, two processes, tracking process and mapping process, are simultaneously maintained in our framework. For the tracking process, all input images are uniformly sampled, then progressively trained in a self-supervised paradigm. For the mapping process, we leverage motion masks to distinguish dynamic objects from static background, and sample more pixels from dynamic areas. Secondly, the parameter optimization for both processes consists of two stages: the first stage associates time with 3D positions to convert the deformation field to the canonical field. And the second stage associates time with the embeddings of canonical field to obtain colors and Signed Distance Function (SDF). Lastly, we propose a novel keyframe selection strategy based on the overlapping rate. We evaluate our approach on two synthetic datasets and one real-world dataset. And the experiments validate that our method achieves competitive results in both tracking and mapping when compared to existing state-of-the-art NeRF-based methods.
View on arXiv