v1v2v3v4 (latest)

Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction

24 October 2019

T. Hascoet

AI4TS

ArXiv (abs)PDF HTML Github (10★)

Main:5 Pages

3 Figures

Bibliography:1 Pages

1 Tables

Appendix:3 Pages

Abstract

This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The task requires modeling both fine-grained (pixel-level) and coarse (region-level) spatial structure while preserving temporal relationships across long sequences. Building on Conv-LSTM ideas, we introduce a tile-aware, cascaded-memory Conv-LSTM augmented with cross-frame additive attention and a memory-flexible training scheme: frames are sampled per spatial tile so the model learns tile-local dynamics and per-tile memory cells can be updated sparsely, paged, or compressed to scale to large maps. We provide a compact theoretical analysis (tight softmax/attention Lipschitz bound and a tiling error lower bound) explaining stability and the memory-accuracy tradeoffs, and empirically demonstrate improved scalability and competitive forecasting performance on large-scale traffic heatmaps.

View on arXiv

Comments on this paper