Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment

We study the problem of efficiently summarizing a short video into several keyframes, leveraging recent progress in fast graph sampling. Specifically, we first construct a similarity path graph (SPG) , represented by graph Laplacian matrix , where the similarities between adjacent frames are encoded as positive edge weights. We show that maximizing the smallest eigenvalue of a coefficient matrix , where is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We prove that, after partitioning into sub-graphs , the smallest Gershgorin circle theorem (GCT) lower bound of corresponding coefficient matrices -- -- is a lower bound for . This inspires a fast graph sampling algorithm to iteratively partition into sub-graphs using samples (keyframes), while maximizing for each sub-graph . Experimental results show that our algorithm achieves comparable video summarization performance as state-of-the-art methods, at a substantially reduced complexity.
View on arXiv