The commercialization of large language models (LLMs) has led to the common
practice of high-level API-only access to proprietary models. In this work, we
show that even with a conservative assumption about the model architecture, it
is possible to learn a surprisingly large amount of non-public information
about an API-protected LLM from a relatively small number of API queries (e.g.,
costing under 1,000forOpenAI′sgpt−3.5−turbo).Ourfindingsarecenteredononekeyobservation:mostmodernLLMssufferfromasoftmaxbottleneck,whichrestrictsthemodeloutputstoalinearsubspaceofthefulloutputspace.Weshowthatthislendsitselftoamodelimageoramodelsignaturewhichunlocksseveralcapabilitieswithaffordablecost:efficientlydiscoveringtheLLM′shiddensize,obtainingfull−vocabularyoutputs,detectinganddisambiguatingdifferentmodelupdates,identifyingthesourceLLMgivenasinglefullLLMoutput,andevenestimatingtheoutputlayerparameters.Ourempiricalinvestigationsshowtheeffectivenessofourmethods,whichallowustoestimatetheembeddingsizeofOpenAI′sgpt−3.5−turbotobeabout4,096.Lastly,wediscusswaysthatLLMproviderscanguardagainsttheseattacks,aswellashowthesecapabilitiescanbeviewedasafeature(ratherthanabug)byallowingforgreatertransparencyandaccountability.