Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation

Computer Vision and Pattern Recognition (CVPR), 2024

20 November 2024

Rohith Peddi

Saurabh

Ayush Abhay Shrivastava

Parag Singla

Vibhav Gogate

ArXiv (abs)PDF HTML Github

Main:8 Pages

10 Figures

Bibliography:3 Pages

34 Tables

Appendix:34 Pages

Abstract

Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation of dynamic scenes by modelling objects and their evolving relationships over time. However, real-world visual relationships often exhibit a long-tailed distribution, causing existing methods for tasks like Video Scene Graph Generation (VidSGG) and Scene Graph Anticipation (SGA) to produce biased scene graphs. To this end, we propose ImparTail, a novel training framework that leverages curriculum learning and loss masking to mitigate bias in the generation and anticipation of spatio-temporal scene graphs. Our approach gradually decreases the dominance of the head relationship classes during training and focuses more on tail classes, leading to more balanced training. Furthermore, we introduce two new tasks, Robust Spatio-Temporal Scene Graph Generation and Robust Scene Graph Anticipation, designed to evaluate the robustness of STSG models against distribution shifts. Extensive experiments on the Action Genome dataset demonstrate that our framework significantly enhances the unbiased performance and robustness of STSG models compared to existing methods.

View on arXiv

Comments on this paper