Road crashes claim over 1.3 million lives annually worldwide and incur global economic losses exceeding \1.8trillion.Suchprofoundsocietalandfinancialimpactsunderscoretheurgentneedforroadsafetyresearchthatuncoverscrashmechanismsanddeliversactionableinsights.Conventionalstatisticalmodelsandtreeensembleapproachestypicallyrelyonstructuredcrashdata,overlookingcontextualnuancesandstrugglingtocapturecomplexrelationshipsandunderlyingsemantics.Moreover,theseapproachestendtoincursignificantinformationloss,particularlyinnarrativeelementsrelatedtomulti−vehicleinteractions,crashprogression,andrareeventcharacteristics.ThisstudypresentsCrashSage,anovelLargeLanguageModel(LLM)−centeredframeworkdesignedtoadvancecrashanalysisandmodelingthroughfourkeyinnovations.First,weintroduceatabular−to−texttransformationstrategypairedwithrelationaldataintegrationschema,enablingtheconversionofraw,heterogeneouscrashdataintoenriched,structuredtextualnarrativesthatretainessentialstructuralandrelationalcontext.Second,weapplycontext−awaredataaugmentationusingabaseLLMmodeltoimprovenarrativecoherencewhilepreservingfactualintegrity.Third,wefine−tunetheLLaMA3−8Bmodelforcrashseverityinference,demonstratingsuperiorperformanceoverbaselineapproaches,includingzero−shot,zero−shotwithchain−of−thoughtprompting,andfew−shotlearning,withmultiplemodels(GPT−4o,GPT−4o−mini,LLaMA3−70B).Finally,weemployagradient−basedexplainabilitytechniquetoelucidatemodeldecisionsatboththeindividualcrashlevelandacrossbroaderriskfactordimensions.Thisinterpretabilitymechanismenhancestransparencyandenablestargetedroadsafetyinterventionsbyprovidingdeeperinsightsintothemostinfluentialfactors.
@article{zhen2025_2505.07853,
title={ CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis },
author={ Hao Zhen and Jidong J. Yang },
journal={arXiv preprint arXiv:2505.07853},
year={ 2025 }
}