GW Authored Works

Accuracies of Training Labels and Machine Learning Models: Experiments on Delirium and Simulated Data

Yan Cheng, George Washington University, Washington, DC, USA.
Yijun Shao, George Washington University, Washington, DC, USA.
James Rudolph, Providence VA Medical Center, Providence, RI, USA.
Charlene R. Weir, Salt Lake City VA Medical Center, Salt Lake City, UT, USA.
Beth Sahlmann, Office of Analytics and Performance Integration, Veterans Health Administration, Fort Myers, FL, USA.
Qing Zeng-Treitler, George Washington University, Washington, DC, USA.

Document Type

Journal Article

Publication Date

6-6-2022

Journal

Studies in health technology and informatics

Volume

290

DOI

10.3233/SHTI220161

Keywords

delirium; support vector machine; weak supervised learning

Abstract

Supervised predictive models require labeled data for training purposes. Complete and accurate labeled data is not always available, and imperfectly labeled data may need to serve as an alternative. An important question is if the accuracy of the labeled data creates a performance ceiling for the trained model. In this study, we trained several models to recognize the presence of delirium in clinical documents using data with annotations that are not completely accurate. In the external evaluation, the support vector machine model with a linear kernel performed best, achieving an area under the curve of 89.3% and accuracy of 88%, surpassing the 80% accuracy of the training sample. We then generated a set of simulated data and carried out a series of experiments which demonstrated that models trained on imperfect data can (but do not always) outperform the accuracy of the training data.

APA Citation

Cheng, Yan; Shao, Yijun; Rudolph, James; Weir, Charlene R.; Sahlmann, Beth; and Zeng-Treitler, Qing, "Accuracies of Training Labels and Machine Learning Models: Experiments on Delirium and Simulated Data" (2022). GW Authored Works. Paper 1155.
https://hsrc.himmelfarb.gwu.edu/gwhpubs/1155

Department

Clinical Research and Leadership

Link to Full Text

COinS

GW Authored Works

Accuracies of Training Labels and Machine Learning Models: Experiments on Delirium and Simulated Data

Document Type

Publication Date

Journal

Volume

DOI

Keywords

Abstract

APA Citation

Department

Search

Browse

Author Corner

Links

GW Authored Works

Accuracies of Training Labels and Machine Learning Models: Experiments on Delirium and Simulated Data

Authors

Document Type

Publication Date

Journal

Volume

DOI

Keywords

Abstract

APA Citation

Department

Share

Search

Browse

Author Corner

Links