Transition Forests: Learning Discriminative Temporal Transitions for
Action Recognition
A human action can be seen as transitions between one's body poses over time, where the transition depicts a temporal relation between two poses. Recognizing actions thus involves learning a classifier sensitive to these pose transitions from given high-dimensional frame representations. In this paper, we introduce transitions forests, an ensemble of decision trees that learn transitions between pairs of two independent frames in a discriminative fashion. During training, node splitting is driven by alternating two different criteria: the standard classification entropy that maximizes the discrimination power in individual frames, and the proposed one in pairwise frame transitions. Growing the trees tends to group frames that have similar associated transitions and share same action label. Unlike conventional classification trees where node-wise the best split is determined, the transition forests try to find the best split of nodes jointly (within a layer) for incorporating distant node transitions. When inferring the class label of a new video, frames are independently passed down the trees (thus highly efficient) then a prediction for a certain time-frame is made based on the transitions between previous observed frames and the current one in an efficient manner. We apply our method on varied action recognition datasets showing its suitability over several baselines and state-of-the-art approaches.
View on arXiv