Internal Validation of the Predictive Performance of Models Based on Three ED and ICU Scoring Systems to Predict Inhospital Mortality for Intensive Care Patients Referred from the Emergency Department

Document Type

Journal Article

Publication Date



BioMed research international






Background: A variety of scoring systems have been introduced for use in both the emergency department (ED) such as WPS, REMS, and MEWS and the intensive care unit (ICU) such as APACHE II, SAPS II, and SOFA for risk stratification and mortality prediction. However, the performance of these models in the ICU remains unclear and we aimed to evaluate and compare their performance in the ICU. Methods: This multicenter retrospective cohort study was conducted on severely ill patients admitted to the ICU directly from the ED in seven tertiary hospitals in Iran from August 2018 to August 2020. We evaluated all models in terms of discrimination (AUROC), the balance between positive predictive value and sensitivity (AUPRC), calibration (Hosmer-Lemeshow test and calibration plots), and overall performance using the Brier score (BS). The endpoint was considered inhospital mortality. Results: Among the 3,455 patients included in the study, 54.4% of individuals were male ( = 1,879) and 26.5% deceased ( = 916). The BS for the WPS, REMS, MEWS, APACHE II, SAPS II, and SOFA were 0.178, 0.165, 0.183, 0.157, 0.170, and 0.182, respectively. The AUROC of these models were 0.728 (0.71-0.75), 0.761 (0.74-0.78), 0.682 (0.66-0.70), 0.810 (0.79-0.83), 0.767 (0.75-0.79), and 0.785 (0.77-0.80), respectively. The AUPRC was 0.517 (0.50-0.53) for WPS, 0.547 (0.53-0.56) for REMS, 0.445 (0.42-0.46) for MEWS, 0.630 (0.61-0.65) for APACHE II, 0.559 (0.54-0.58) for SAPS II, and 0.564 (0.54-0.57) for SOFA. All models except the MEWS and SOFA had good calibration. The most accurate model belonged to APACHE II with lowest BS. Conclusion: The APACHE II outperformed all the ED and ICU models and was found to be the most appropriate model in predicting inhospital mortality of patients in the ICU in terms of discrimination, calibration, and accuracy of predicted probability. Except for MEWS, the rest of the models had fair discrimination and partially good calibration. Interestingly, although the REMS is less complicated than the SAPS II, both models exhibited similar performance. Clinicians can utilize the REMS as part of a larger clinical assessment to manage patients more effectively.


Emergency Medicine