Machine-Learned Codes from EHR Data Predict Hard Outcomes Better than Human-Assigned ICD Codes

Document Type

Journal Article

Publication Date

6-1-2025

Journal

Machine learning and knowledge extraction

Volume

7

Issue

2

DOI

10.3390/make7020036

Keywords

ICD coding; impact factors; machine learning prediction; morbidity; mortality

Abstract

We used machine learning (ML) to characterize 894,154 medical records of outpatient visits from the Veterans Administration Central Data Warehouse (VA CDW) by the likelihood of assignment of 200 International Classification of Diseases (ICD) code blocks. Using four different predictive models, we found the ML-derived predictions for the code blocks were consistently more effective in predicting death or 90-day rehospitalization than the assigned code block in the record. We reviewed records of ICD chapter assignments. The review revealed that the ML-predicted chapter assignments were consistently better than those humanly assigned. Impact factor analysis, a method of explanation of AI findings that was developed in our group, demonstrated little effect on any one assigned ICD code block but a marked impact on the ML-derived code blocks of kidney disease as well as several other morbidities. In this study, machine learning was much better than human code assignment at predicting the relatively rare outcomes of death or rehospitalization. Future work will address generalizability using other datasets, as well as addressing coding that is more nuanced than that of the categorization provided by code blocks.

Department

Biostatistics and Bioinformatics

Share

COinS