Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022

22 July 2022

Abstract

We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.

View on arXiv

Comments on this paper