0

Adoption and Use of LLMs at an Academic Medical Center

Nigam H. Shah
Nerissa Ambers
Abby Pandya
Timothy Keyes
Juan M. Banda
Srikar Nallan
Carlene Lugtu
Artem A. Trotsyuk
Suhana Bedi
Alyssa Unell
Miguel Fuentes
Francois Grolleau
Sneha S. Jain
Jonathan Chen
Devdutta Dash
Danton Char
Aditya Sharma
Duncan McElfresh
Patrick Scully
Vishanthan Kumar
Connor OBrien
Satchi Mouniswamy
Elvis Jones
Krishna Jasti
Gunavathi Mannika Lakshmanan
Sree Ram Akula
Varun Kumar Singh
Ramesh Rajmanickam
Sudhir Sinha
Vicky Zhou
Xu Wang
Bilal Mawji
Joshua Ge
Wencheng Li
Travis Lyons
Jarrod Helzer
Vikas Kakkar
Ramesh Powar
Darren Batara
Cheryl Cordova
William Frederick III
Olivia Tang
Phoebe Morgan
April S. Liang
Stephen P. Ma
Shivam Vedak
Dong-han Yao
Akshay Swaminathan
Mehr Kashyap
Brian Ng
Jamie Hellman
Nikesh Kotecha
Christopher Sharp
Gretchen Brown
Christian Lindmark
Anurang Revri
Michael A. Pfeffer
Abstract

While large language models (LLMs) can support clinical documentation needs, standalone tools struggle with "workflow friction" from manual data entry. We developed ChatEHR, a system that enables the use of LLMs with the entire patient timeline spanning several years. ChatEHR enables automations - which are static combinations of prompts and data that perform a fixed task - and interactive use in the electronic health record (EHR) via a user interface (UI). The resulting ability to sift through patient medical records for diverse use-cases such as pre-visit chart review, screening for transfer eligibility, monitoring for surgical site infections, and chart abstraction, redefines LLM use as an institutional capability. This system, accessible after user-training, enables continuous monitoring and evaluation of LLM use.In 1.5 years, we built 7 automations and 1075 users have trained to become routine users of the UI, engaging in 23,000 sessions in the first 3 months of launch. For automations, being model-agnostic and accessing multiple types of data was essential for matching specific clinical or administrative tasks with the most appropriate LLM. Benchmark-based evaluations proved insufficient for monitoring and evaluation of the UI, requiring new methods to monitor performance. Generation of summaries was the most frequent task in the UI, with an estimated 0.73 hallucinations and 1.60 inaccuracies per generation. The resulting mix of cost savings, time savings, and revenue growth required a value assessment framework to prioritize work as well as quantify the impact of using LLMs. Initial estimates are $6M savings in the first year of use, without quantifying the benefit of the better care offered. Such a "build-from-within" strategy provides an opportunity for health systems to maintain agency via a vendor-agnostic, internally governed LLM platform.

View on arXiv
Comments on this paper