Artificial Intelligence holds tremendous potential in medicine, but is
traditionally limited by the lack of massive datasets to train models on.
Foundation models, pre-trained models that can be adapted to downstream tasks
with small datasets, could alleviate this problem. Researchers at Moorfields
Eye Hospital (MEH) proposed RETFound-MEH, a foundation model for retinal
imaging that was trained on 900,000 images, including private hospital data.
Recently, data-efficient DERETFound was proposed that provides comparable
performance while being trained on only 150,000 images that are all publicly
available. However, both these models required very substantial resources to
train initially and are resource-intensive in downstream use. We propose a
novel Token Reconstruction objective that we use to train RETFound-Green, a
retinal foundation model trained using only 75,000 publicly available images
and 400 times less compute. We estimate the cost of training RETFound-MEH and
DERETFound at 10,000and14,000, respectively, while RETFound-Green could be
trained for less than 100,withequallyreducedenvironmentalimpact.RETFound−Greenisalsofarmoreefficientindownstreamuse:itcanbedownloaded14timesfaster,computesvectorembeddings2.7timesfasterwhichthenrequire2.6timeslessstoragespace.Despitethis,RETFound−Greendoesnotperformsystematicallyworse.Infact,itperformsbeston14tasks,comparedtosixforDERETFoundandtwoforRETFound−MEH.OurresultssuggestthatRETFound−Greenisaveryefficient,high−performanceretinalfoundationmodel.WeanticipatethatourTokenReconstructionobjectivecouldbescaledupforevenhigherperformanceandbeappliedtootherdomainsbeyondretinalimaging.