Miller School of Medicine Gastroenterologists Study Use of ChatGPT for Surveillance Colonoscopy Intervals

Summary
- The Miller School of Medicine’s Dr. Daniel Sussman and Dr. Amar Deshpande coauthored a new study demonstrating that a large language model (LLM) achieved high accuracy in determining guideline-based surveillance colonoscopy intervals.
- The study’s authors asked OpenAI’s ChatGPT models three and four to determine the appropriate surveillance interval for a dataset of 1,000 patients.
- The researchers say the approach could save doctors significant time when deciding on colonoscopy intervals.
Daniel Sussman, M.D., and Amar Deshpande, M.D., both faculty in the University of Miami Miller School of Medicine’s Division of Digestive Health and Liver Diseases, coauthored a new study demonstrating that a large language model (LLM) achieved consistently high accuracy in determining guideline-based surveillance colonoscopy intervals.
The Miller School team developed the LLM in collaboration with University of Pennsylvania Perelman School of Medicine researchers.
The American Journal of Gastroenterology (AJG) published this study based on a dataset of 1,000 patients who underwent screening or surveillance colonoscopies in 2023 to 2024 at the two academic health centers. It follows a proof-of-concept study on 100 patients demonstrating 95% or more accuracy using a custom-prompted LLM published in Gastroenterology earlier this year.
Saving Time When Determining Colonoscopy Intervals
The idea behind using the LLM is to unburden physicians from having to make time-consuming calculations, according to Dr. Deshpande, professor of medicine and senior associate dean for medical education at the Miller School.
“Basically, determining surveillance colonoscopy intervals for patients with polyps requires calculating three variables—the number and size of the polyps and their histology,” Dr. Deshpande said. “Physicians who perform colonoscopies then plug these calculations into a table in the guidelines to determine whether a patient should return in three, five, seven or 10 years, for example.”

The manual process is prone to error when applying guidelines, with wide variability and inconsistency in provider recommendations, according to Dr. Sussman, professor and interim chief of the Division of Digestive Health and Liver Diseases at the Miller School.
“There is evidence of surveillance colonoscopy overuse and underuse,” Dr. Sussman said. “We informed the LLM in this study by writing a prompt that melds the data from the colonoscopy and related pathology reports, generating a recommendation for subsequent surveillance exams while maintaining oversight in a human-in-the-loop approach. This had high accuracy for colonoscopies where small numbers of polyps were removed.”
ChatGPT and Colonoscopy Intervals
The study’s authors used a custom prompt outlining the U.S. Multi-Society Task Force’s (USMSTF) post-polypectomy surveillance algorithm. They asked OpenAI’s ChatGPT models three and four to determine the appropriate surveillance interval for the 1,000 examples in the dataset.
They repeated the experiment, with the same model, prompt and dataset, 10 times.
“Three of us who are GI doctors who perform colonoscopies looked at cases, blinded, and noted our recommended surveillance intervals, with the physician opinion being the gold standard,” Dr. Deshpande said. “Then, we ran the LLM and it almost always agreed with us. In cases where the LLM didn’t agree, we revised the prompts to more accurately reflect how we practice.”
In the future, patients can use such tools to better understand the clinical advice, improve timely communication and make sure they receive appropriate follow-up.
Dr. Daniel Sussman
They found the average accuracy was 94.6% across 10 experiments. Examples with one to three colon polyps had an average accuracy of 95.8%, while examples with four or more colon polyps had an average accuracy of 88.2%. The model identified cases where guidelines do not apply, including when reports noted inadequate bowel preparation or elevated underlying cancer risk.
“The LLM’s framework enables reinforcement learning from human feedback, which allows the model to improve with time,” Dr. Sussman explained. “Reinforcement learning from human feedback also will allow the model to adapt to individual clinicians’ preferences in real-world practice.”
The developers anticipated known issues like hallucination, which can threaten the accuracy of LLMs.
“We found some instances in our pilot study where we weren’t being specific enough in our prompt and the LLM would hallucinate, and we addressed those,” Dr. Sussman said. “In our validation study, we identified areas where we needed to teach the LLM more nuanced details from the procedure notes and pathology notes to refine these recommendations.”
Bringing the Research to Colonoscopy Patients
Doctors who perform colonoscopies understand current workflow inefficiencies, according to Dr. Deshpande.
“Those who do, let’s say, 20 colonoscopies a day…if all you have to do is hit that, you approve of or don’t approve of (and then correct) the LLM’s recommendation,” Dr. Deshpande said. “That’s a lot more efficient than going through all 20 pathology reports, cross-referencing them with the associated colonoscopy reports and then thinking through a table of surveillance interval recommendations to determine how many years each follow-up is due,” he said.
The next step is to incorporate the LLM into an electronic health record, where the algorithm not only makes the calculations but also generates patient letters and aids in scheduling.
“While the LLM is not yet ready for widespread use in practice, clinicians could soon rely on this tool for the majority of colonoscopy surveillance interval determination, leaving complex decision-making to clinicians until we can better inform the AI models,” Dr. Sussman said. “In the future, patients can use such tools to better understand the clinical advice, improve timely communication and make sure they receive appropriate follow-up that is adherent to what guidelines recommend.”
Tags: AI, artificial intelligence, colonoscopy, Division of Digestive Health and Liver Diseases, Dr. Amar Deshpande, Dr. Daniel Sussman, gastroenterology, technology, USNWR Gastro, USNWR Gastro 2026