FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection

1 April 2025

Abstract

Temporal action detection aims to locate and classify actions in untrimmed videos. While recent works focus on designing powerful feature processors for pre-trained representations, they often overlook the inherent noise and redundancy within these features. Large-scale pre-trained video encoders tend to introduce background clutter and irrelevant semantics, leading to context confusion and imprecise boundaries. To address this, we propose a frequency-aware decoupling network that improves action discriminability by filtering out noisy semantics captured by pre-trained models. Specifically, we introduce an adaptive temporal decoupling scheme that suppresses irrelevant information while preserving fine-grained atomic action details, yielding more task-specific representations. In addition, we enhance inter-frame modeling by capturing temporal variations to better distinguish actions from background redundancy. Furthermore, we present a long-short-term category-aware relation network that jointly models local transitions and long-range dependencies, improving localization precision. The refined atomic features and frequency-guided dynamics are fed into a standard detection head to produce accurate action predictions. Extensive experiments on THUMOS14, HACS, and ActivityNet-1.3 show that our method, powered by InternVideo2-6B features, achieves state-of-the-art performance on temporal action detection benchmarks.

View on arXiv

@article{zhu2025_2504.00647,
  title={ FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection },
  author={ Xinnan Zhu and Yicheng Zhu and Tixin Chen and Wentao Wu and Yuanjie Dang },
  journal={arXiv preprint arXiv:2504.00647},
  year={ 2025 }
}

Comments on this paper