Machine-Learned Codes from EHR Data Predict Hard Outcomes Better than Human-Assigned ICD Codes
Document Type
Journal Article
Publication Date
6-1-2025
Journal
Machine learning and knowledge extraction
Volume
7
Issue
2
DOI
10.3390/make7020036
Keywords
ICD coding; impact factors; machine learning prediction; morbidity; mortality
Abstract
We used machine learning (ML) to characterize 894,154 medical records of outpatient visits from the Veterans Administration Central Data Warehouse (VA CDW) by the likelihood of assignment of 200 International Classification of Diseases (ICD) code blocks. Using four different predictive models, we found the ML-derived predictions for the code blocks were consistently more effective in predicting death or 90-day rehospitalization than the assigned code block in the record. We reviewed records of ICD chapter assignments. The review revealed that the ML-predicted chapter assignments were consistently better than those humanly assigned. Impact factor analysis, a method of explanation of AI findings that was developed in our group, demonstrated little effect on any one assigned ICD code block but a marked impact on the ML-derived code blocks of kidney disease as well as several other morbidities. In this study, machine learning was much better than human code assignment at predicting the relatively rare outcomes of death or rehospitalization. Future work will address generalizability using other datasets, as well as addressing coding that is more nuanced than that of the categorization provided by code blocks.
APA Citation
Yin, Ying; Shao, Yijun; Ma, Phillip; Zeng-Treitler, Qing; and Nelson, Stuart J., "Machine-Learned Codes from EHR Data Predict Hard Outcomes Better than Human-Assigned ICD Codes" (2025). GW Authored Works. Paper 7494.
https://hsrc.himmelfarb.gwu.edu/gwhpubs/7494
Department
Biostatistics and Bioinformatics