DAZSL: Dynamic Attributes for Zero-Shot Learning
Inspired by earlier applications to still images, zero-shot activity recognition has largely focused on image-derived representations without regard to the video's temporal aspect. Since these methods cannot capture the time evolution of an activity, reversible actions such as entering and exiting a car are often indistinguishable. In this work, we present a simple and elegant framework for modeling activities using dynamic attribute signatures. We show that specifying temporal structure greatly increases zero-shot systems' discriminative power. We also extend our method to form the first framework to our knowledge for zero-shot joint segmentation and classification of activities in videos. We evaluate our method on the Olympic Sports and UCF101 datasets, where our model establishes a new state of the art under multiple experimental paradigms. We also demonstrate the first results in zero-shot decoding of complex action sequences on a widely-used surgical dataset. Lastly, we show that we can even eliminate the need to train attribute detectors by using off-the-shelf object detectors to recognize activities in challenging security footage.
View on arXiv