Event cameras provide superior temporal resolution, dynamic range, power efficiency, and pixel bandwidth. Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking. However, current approaches that combine Artificial Neural Networks (ANNs) and SNNs, along with suboptimal architectures, compromise energy efficiency and limit tracking performance. To address these limitations, we propose the first Transformer-based spike-driven tracking pipeline. Our Global Trajectory Prompt (GTP) method effectively captures global trajectory information and aggregates it with event streams into event images to enhance spatiotemporal representation. We then introduce SDTrack, a Transformer-based spike-driven tracker comprising a Spiking MetaFormer backbone and a simple tracking head that directly predicts normalized coordinates using spike signals. The framework is end-to-end, does not require data augmentation or post-processing. Extensive experiments demonstrate that SDTrack achieves state-of-the-art performance while maintaining the lowest parameter count and energy consumption across multiple event-based tracking benchmarks, establishing a solid baseline for future research in the field of neuromorphic vision.
View on arXiv@article{shan2025_2503.08703, title={ SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks }, author={ Yimeng Shan and Zhenbang Ren and Haodi Wu and Wenjie Wei and Rui-Jie Zhu and Shuai Wang and Dehao Zhang and Yichen Xiao and Jieyuan Zhang and Kexin Shi and Jingzhinan Wang and Jason K. Eshraghian and Haicheng Qu and Jiqing Zhang and Malu Zhang and Yang Yang }, journal={arXiv preprint arXiv:2503.08703}, year={ 2025 } }