Recent advances in computer vision-in the form of deep neural networks-have
made it possible to query increasing volumes of video data with high accuracy.
However, neural network inference is computationally expensive at scale:
applying a state-of-the-art object detector in real time (i.e., 30+ frames per
second) to a single video requires a 4000GPU.Inresponse,wepresentNoScope,asystemforqueryingvideosthatcanreducethecostofneuralnetworkvideoanalysisbyuptothreeordersofmagnitudeviainference−optimizedmodelsearch.Givenatargetvideo,objecttodetect,andreferenceneuralnetwork,NoScopeautomaticallysearchesforandtrainsasequence,orcascade,ofmodelsthatpreservestheaccuracyofthereferencenetworkbutisspecializedtothetargetvideoandarethereforefarlesscomputationallyexpensive.NoScopecascadestwotypesofmodels:specializedmodelsthatforegothefullgeneralityofthereferencemodelbutfaithfullymimicitsbehaviorforthetargetvideoandobject;anddifferencedetectorsthathighlighttemporaldifferencesacrossframes.Weshowthattheoptimalcascadearchitecturediffersacrossvideosandobjects,soNoScopeusesanefficientcost−basedoptimizertosearchacrossmodelsandcascades.Withthisapproach,NoScopeachievestwotothreeorderofmagnitudespeed−ups(265−15,500xreal−time)onbinaryclassificationtasksoverfixed−anglewebcamandsurveillancevideowhilemaintainingaccuracywithin1−5state−of−the−artneuralnetworks.