Large language model as a clinical decision support tool in the initial management of critically ill children: a pilot evaluation

Osnat Tausky, Eytan Kaplan, Gili Kadmon, Yulia Gendler, Elhanan Nahum, Shai Yitzhaki, Avichai Weissbach

Research output: Contribution to journalArticlepeer-review

Abstract

Large language models (LLMs) like ChatGPT are being explored as clinical decision support tools, but their reliability in pediatric acute care remains uncertain. This pilot study assessed ChatGPT-4.0’s performance in the early management of critically ill children using real-world clinical data. We retrospectively analyzed 20 children emergently admitted from the emergency department (ED) to a tertiary pediatric intensive care unit (PICU). ChatGPT-4.0 was prompted at four time points: ED arrival (diagnostic and therapeutic plans), ED transfer (differential diagnosis and hospitalization decision), PICU admission (diagnostic and therapeutic plans), and 24 h into PICU stay (differential diagnosis). Outputs were compared to actual care and evaluated for accuracy, safety, and omissions. At ED and PICU admission, 94% (95% CI, 91–97%) and 98% (95% CI, 95–99%) of diagnostic recommendations were rated as appropriate. Only 82% (95% CI, 76–87%) of therapeutic recommendations were considered appropriate at both points (p < 0.001). Potentially harmful therapeutic suggestions were more common than diagnostic ones: 7% vs. 2% in the ED (p = 0.016) and 10% vs. 0% in the PICU (p < 0.00001). In the PICU, critically missing therapeutic recommendations occurred at 0.95 per case, compared to 0.15 for diagnostic ones (p = 0.0073). The correct diagnosis appeared in 100% of ED discharge and 95% (95% CI, 85–100%) of PICU 24-h differentials. Triage decisions were accurate in all PICU cases. Conclusion: ChatGPT-4.0 showed good diagnostic and triage performance but requires caution, especially for therapeutic decisions and broader pediatric use. (Table presented.)

Original languageEnglish
Article number757
JournalEuropean Journal of Pediatrics
Volume184
Issue number12
DOIs
StatePublished - Dec 2025

Keywords

  • Artificial intelligence
  • ChatGPT
  • Clinical decision support
  • Emergency medicine
  • Large language models
  • Pediatric intensive care

Fingerprint

Dive into the research topics of 'Large language model as a clinical decision support tool in the initial management of critically ill children: a pilot evaluation'. Together they form a unique fingerprint.

Cite this