Large Language Models (LLMs) have achieved remarkable results, but their
increasing resource demand has become a major obstacle to the development of
powerful and accessible super-human intelligence. This report introduces
JetMoE-8B, a new LLM trained with less than 0.1million,using1.25Ttokensfromcarefullymixedopen−sourcecorporaand30,000H100GPUhours.Despiteitslowcost,theJetMoE−8Bdemonstratesimpressiveperformance,withJetMoE−8BoutperformingtheLlama2−7BmodelandJetMoE−8B−ChatsurpassingtheLlama2−13B−Chatmodel.TheseresultssuggestthatLLMtrainingcanbemuchmorecost−effectivethangenerallythought.JetMoE−8BisbasedonanefficientSparsely−gatedMixture−of−Experts(SMoE)architecture,composedofattentionandfeedforwardexperts.Bothlayersaresparselyactivated,allowingJetMoE−8Btohave8Bparameterswhileonlyactivating2Bforeachinputtoken,reducinginferencecomputationbyabout70ishighlyopenandacademia−friendly,usingonlypublicdatasetsandtrainingcode.Alltrainingparametersanddatamixtureshavebeendetailedinthisreporttofacilitatefutureeffortsinthedevelopmentofopenfoundationmodels.ThistransparencyaimstoencouragecollaborationandfurtheradvancementsinthefieldofaccessibleandefficientLLMs.Themodelweightsarepubliclyavailableathttps://github.com/myshell−ai/JetMoE.