Deep learning has made great strides in medical imaging, enabled by hardware
advances in GPUs. One major constraint for the development of new models has
been the saturation of GPU memory resources during training. This is especially
true in computational pathology, where images regularly contain more than 1
billion pixels. These pathological images are traditionally divided into small
patches to enable deep learning due to hardware limitations. In this work, we
explore whether the shared GPU/CPU memory architecture on the M1 Ultra
systems-on-a-chip (SoCs) recently released by Apple, Inc. may provide a
solution. These affordable systems (less than \5000)provideaccessto128GBofunifiedmemory(MacStudiowithM1UltraSoC).Asaproofofconceptforgigapixeldeeplearning,weidentifiedtissuefrombackgroundongigapixelareasfromwholeslideimages(WSIs).ThemodelwasamodifiedU−Net(4492parameters)leveraginglargekernelsandhighstride.TheM1UltraSoCwasabletotrainthemodeldirectlyongigapixelimages(16000\times64000pixels,1.024billionpixels)withabatchsizeof1usingover100GBofunifiedmemoryfortheprocessatanaveragespeedof1minuteand21secondsperbatchwithTensorflow2/Keras.Asexpected,themodelconvergedwithahighDicescoreof0.989\pm0.005.Trainingupuntilthispointtook111hoursand24minutesover4940steps.OtherhighRAMGPUsliketheNVIDIAA100(largestcommerciallyaccessibleat80GB,\sim\15000) are not yet widely available
(in preview for select regions on Amazon Web Services at \40.96/hourasagroupof8).ThisstudyisapromisingsteptowardsWSI−wiseend−to−enddeeplearningwithprevalentnetworkarchitectures.