Adoption and Use of LLMs at an Academic Medical Center

21 January 2026

Nigam H. Shah

Nerissa Ambers

Abby Pandya

Timothy Keyes

Juan M. Banda

Srikar Nallan

Carlene Lugtu

Artem A. Trotsyuk

Suhana Bedi

Alyssa Unell

Miguel Fuentes

Francois Grolleau

Sneha S. Jain

Jonathan Chen

Devdutta Dash

Danton Char

Aditya Sharma

Duncan McElfresh

Patrick Scully

Vishanthan Kumar

Connor OBrien

Satchi Mouniswamy

Elvis Jones

Krishna Jasti

Gunavathi Mannika Lakshmanan

Sree Ram Akula

Varun Kumar Singh

Ramesh Rajmanickam

Sudhir Sinha

Vicky Zhou

Xu Wang

Bilal Mawji

Joshua Ge

Wencheng Li

Travis Lyons

Jarrod Helzer

Vikas Kakkar

Ramesh Powar

Darren Batara

Cheryl Cordova

William Frederick III

Olivia Tang

Phoebe Morgan

April S. Liang

Stephen P. Ma

Shivam Vedak

Dong-han Yao

Akshay Swaminathan

Mehr Kashyap

Brian Ng

Jamie Hellman

Nikesh Kotecha

Christopher Sharp

Gretchen Brown

Christian Lindmark

Anurang Revri

Michael A. Pfeffer

LM&MA

ArXiv (abs)PDF HTML

Abstract

While large language models (LLMs) can support clinical documentation needs, standalone tools struggle with "workflow friction" from manual data entry. We developed ChatEHR, a system that enables the use of LLMs with the entire patient timeline spanning several years. ChatEHR enables automations - which are static combinations of prompts and data that perform a fixed task - and interactive use in the electronic health record (EHR) via a user interface (UI). The resulting ability to sift through patient medical records for diverse use-cases such as pre-visit chart review, screening for transfer eligibility, monitoring for surgical site infections, and chart abstraction, redefines LLM use as an institutional capability. This system, accessible after user-training, enables continuous monitoring and evaluation of LLM use.In 1.5 years, we built 7 automations and 1075 users have trained to become routine users of the UI, engaging in 23,000 sessions in the first 3 months of launch. For automations, being model-agnostic and accessing multiple types of data was essential for matching specific clinical or administrative tasks with the most appropriate LLM. Benchmark-based evaluations proved insufficient for monitoring and evaluation of the UI, requiring new methods to monitor performance. Generation of summaries was the most frequent task in the UI, with an estimated 0.73 hallucinations and 1.60 inaccuracies per generation. The resulting mix of cost savings, time savings, and revenue growth required a value assessment framework to prioritize work as well as quantify the impact of using LLMs. Initial estimates are $6M savings in the first year of use, without quantifying the benefit of the better care offered. Such a "build-from-within" strategy provides an opportunity for health systems to maintain agency via a vendor-agnostic, internally governed LLM platform.

View on arXiv

Comments on this paper