Online pre-training with long-form videos

Global Conference on Consumer Electronics (GCE), 2024

28 August 2024

Itsuki Kato

Kodai Kamiya

Toru Tamaki

OnRL

ArXiv (abs)PDF HTML

Main:1 Pages

1 Figures

Bibliography:1 Pages

Abstract

In this study, we investigate the impact of online pre-training with continuous video clips. We will examine three methods for pre-training (masked image modeling, contrastive learning, and knowledge distillation), and assess the performance on downstream action recognition tasks. As a result, online pre-training with contrast learning showed the highest performance in downstream tasks. Our findings suggest that learning from long-form videos can be helpful for action recognition with short videos.

View on arXiv

Comments on this paper