Predicting hospital admissions, ICU utilization, and prolonged length of stay among febrile pediatric emergency department patients using incomplete and imbalanced electronic health record (EHR) data strategies
Document Type
Journal Article
Publication Date
8-1-2025
Journal
International journal of medical informatics
Volume
200
DOI
10.1016/j.ijmedinf.2025.105905
Keywords
Febrile; Imbalanced data; Imputation; Machine learning; Pediatric emergency medicine
Abstract
OBJECTIVE: Determine the efficacy of commonly used approaches to handling missing and/or imbalanced Electronic Health Record (EHR) data on the performance of predictive models targeting risk of admission, intensive care unit (ICU) use, or prolonged length of stay (PLOS) among presenting febrile pediatric emergency department (ED) patients. MATERIALS AND METHODS: Historical ED EHR data was used to train a series of XGBoost (XGB) and logistic regression (LR) classifiers. Data handling strategies included imputation methods (multiple imputation (MI), median imputation, complete case (CC) analysis), and imbalanced data corrections (minority oversampling, stratified sub-group analysis). Model performance was evaluated using discriminative (AUC, AUPRC) and calibration metrics (Brier score, Z-scores, p-values). RESULTS: Among the study population, 34 % were admitted, 2 % utilized the ICU, and 7 % had a PLOS. Significant data missingness was observed and determined to be not at random (MNAR). In predicting admissions using data recorded within the first two hours of presentation, LR trained using full cohort with median imputation was comparable to MI yielding well-calibrated admissions models with an AUC/AUPRC of 0.82/0.73 while CC analysis yielded an AUC/AUPRC of 0.76/0.78. XGB, trained with unimputed data, produced a well-calibrated admissions classifier with an AUC/AUPRC of 0.85/0.78. In contrast, imbalanced data correction techniques, including synthetic minority oversampling (SMOTE), risk stratification, or the use of XGB did not significantly improve the poor AUPRC and calibration performance of LR models predicting ICU and PLOS. CONCLUSION: Both XGB and LR with median imputation demonstrated robust performance in predicting admissions in the presence of missing data. However, deriving clinically useful models for rare outcomes, such as ICU use or PLOS, remains a challenge due to poor precision/recall and calibration performance. Further research is needed to improve the prediction of rare outcomes in this population.
APA Citation
Velez, Tom; Ibrahim, Zara; Duru, Kanayo; Velez, Dante; Triantafyllou, Maria; McKinley, Kenneth; Saif, Pasha; Kratimenos, Panagiotis; Clark, Andy; and Koutroulis, Ioannis, "Predicting hospital admissions, ICU utilization, and prolonged length of stay among febrile pediatric emergency department patients using incomplete and imbalanced electronic health record (EHR) data strategies" (2025). GW Authored Works. Paper 6952.
https://hsrc.himmelfarb.gwu.edu/gwhpubs/6952
Department
Pediatrics