Development and Validation of an XGBoost-Algorithm-Powered Survival Model for Predicting In-Hospital Mortality Based on 545,388 Isolated Severe Traumatic Brain Injury Patients from the TQIP Database

Document Type

Journal Article

Publication Date



Journal of personalized medicine








Trauma Quality Improvement Program (TQIP); extreme gradient boosting (XGBoost); machine learning; prediction model; survival analysis; traumatic brain injury


BACKGROUND: Traumatic brain injury (TBI) represents a significant global health issue; the traditional tools such as the Glasgow Coma Scale (GCS) and Abbreviated Injury Scale (AIS) which have been used for injury severity grading, struggle to capture outcomes after TBI. AIM AND METHODS: This paper aims to implement extreme gradient boosting (XGBoost), a powerful machine learning algorithm that combines the predictions of multiple weak models to create a strong predictive model with high accuracy and efficiency, in order to develop and validate a predictive model for in-hospital mortality in patients with isolated severe traumatic brain injury and to identify the most influential predictors. In total, 545,388 patients from the 2013-2021 American College of Surgeons Trauma Quality Improvement Program (TQIP) database were included in the current study, with 80% of the patients used for model training and 20% of the patients for the final model test. The primary outcome of the study was in-hospital mortality. Predictors were patients' demographics, admission status, as well as comorbidities, and clinical characteristics. Penalized Cox regression models were used to investigate the associations between the survival outcomes and the predictors and select the best predictors. An extreme gradient boosting (XGBoost)-powered Cox regression model was then used to predict the survival outcome. The performance of the models was evaluated using the Harrell's concordance index (C-index). The time-dependent area under the receiver operating characteristic curve (AUC) was used to evaluate the dynamic cumulative performance of the models. The importance of the predictors in the final prediction model was evaluated using the Shapley additive explanations (SHAP) value. RESULTS: On average, the final XGBoost-powered Cox regression model performed at an acceptable level for patients with a length of stay up to 250 days (mean time-dependent AUC = 0.713) in the test dataset. However, for patients with a length of stay between 20 and 213 days, the performance of the model was relatively poor (time-dependent AUC < 0.7). When limited to patients with a length of stay ≤20 days, which accounts for 95.4% of all the patients, the model achieved an excellent performance (mean time-dependent AUC = 0.813). When further limited to patients with a length of stay ≤5 days, which accounts for two-thirds of all the patients, the model achieved an outstanding performance (mean time-dependent AUC = 0.917). CONCLUSION: The XGBoost-powered Cox regression model can achieve an outstanding predictive ability for in-hospital mortality during the first 5 days, primarily based on the severity of the injury, the GCS on admission, and the patient's age. These variables continue to demonstrate an excellent predictive ability up to 20 days after admission, a period of care that accounts for over 95% of severe TBI patients. Past 20 days of care, other factors appear to be the primary drivers of in-hospital mortality, indicating a potential window of opportunity for improving outcomes.