Artificial intelligence (AI) can help radiologists analyze images or doctors make diagnoses with a high degree of accuracy even with traditional machine learning techniques, but they tend to require large amounts of training data to accomplish this.
Researchers at the Icahn School of Medicine at Mount Sinai are exploring using the latest technique in generative AI—specifically large language models (LLMs)—to see if it can achieve accurate predictions with less training data. Generative AI is rooted in the concept of generating new content typically by understanding data distribution.
Using a specially prepared, secure version of GPT-4—a product from OpenAI, the company that runs the popular generative AI platform ChatGPT—the team applied the model to predict admissions in the Emergency Department, based on objective data collected from patients and triage notes.
“One of the advantages of LLMs over traditional methods is that you can use just a few examples to train the model for any use case,” says Eyal Klang, MD, Associate Professor of Medicine, and Director of the Generative AI Research Program within the Division of Data-Driven and Digital Medicine (D3M), at Icahn Mount Sinai. “You don’t need to retrain models again and again for each use case, which is very hard when that can take millions of data points.”
“Another advantage of LLMs is its ability to explain to the user how it arrived at its answer,” says Dr. Klang. The model’s ability to explain its reasoning provides confidence for a physician to use it in assisting in making medical decisions.
Here’s an animated explainer on how Dr. Klang and his team tested GPT-4 against traditional machine learning methods for predicting whether patients who go to the ER need to be admitted.
The study used patient visit data from seven hospitals within the Mount Sinai Health System. More than 864,000 emergency room visits were included in the data cohort. The ensemble model comprising traditional machine learning techniques achieved an AUC score of 0.878 in predicting admissions, with an accuracy of 82.9 percent. (An AUC score measures the ability to make correct positive and negative guesses, with an 0.5 score meaning the model performed no better than a random guess.)
The GPT-4 model was given the same task of predicting ER admissions, but under a few different conditions: “off the shelf” (not given any examples of patients, also known as “zero-shot”); given some probabilities of how machine learning models would perform; given 10 examples of patients with triage notes (“few-shot”); given 10 contextually similar cases (retrieval-augmented generation, or RAG); and various combinations of these conditions. In the setting with the most information provided (few-shot with RAG and machine learning probabilities), GPT-4 had an AUC score of 0.874, and an accuracy of 83.1 percent—results statistically similar to the ensemble model.
The findings were published in the Journal of the American Medical Informatics Association on Tuesday, May 21.
In this Q&A, Dr. Klang discusses the team’s research.
What was the motivation for your study?
Our study was motivated by the need to test if generative AI, like the GPT-4 model, can improve prediction of admission—and thus clinical decision-making—in a high-volume setting like the Emergency Department. We compared it against older machine learning methods, as well as evaluated its performance in combination with older machine learning methods.
What are the implications?
It suggests that AI, specifically large language models, could soon be used to support doctors in emergency rooms by making quick, informed predictions about whether a patient should be admitted or not.
What are the limitations of the study?
The study relied on data from a single urban health system, which may not represent conditions in other medical settings. Additionally, our study also didn’t prospectively assess the impact of integrating this AI technology into the daily workflow of emergency departments, which could influence its practical effectiveness.
How might these findings be put to use?
These findings could be used to develop AI tools, such as those that integrate GPT-4, that support making accurate clinical decisions. This could promote a model of AI-assisted care that is data-driven and streamlined, using only very few examples to train the platform. It also sets the stage for further research into the integration of AI in health care, potentially leading to more sophisticated AI applications that are capable of reasoning and learning from limited data in real-time clinical settings.
What is your plan for following up on this study?
Our group is actively working on the practical application of LLMs in real-world settings. We are exploring the most effective ways to combine traditional machine learning with LLMs to address complex problems in these environments.
Learn more about how Mount Sinai researchers and clinicians are leveraging machine learning to improve patient lives