Identifying suicide documentation in clinical notes through zero-shot learning

Document Type

Journal Article

Publication Date



Health science reports








Veterans; computer; international classification of diseases; natural language processing; neural networks; suicide


BACKGROUND AND AIMS: In deep learning, a major difficulty in identifying suicidality and its risk factors in clinical notes is the lack of training samples given the small number of true positive instances among the number of patients screened. This paper describes a novel methodology that identifies suicidality in clinical notes by addressing this data sparsity issue through zero-shot learning. Our general aim was to develop a tool that leveraged zero-shot learning to effectively identify suicidality documentation in all types of clinical notes. METHODS: US Veterans Affairs clinical notes served as data. The training data set label was determined using diagnostic codes of suicide attempt and self-harm. We used a base string associated with the target label of suicidality to provide auxiliary information by narrowing the positive training cases to those containing the base string. We trained a deep neural network by mapping the training documents' contents to a semantic space. For comparison, we trained another deep neural network using the identical training data set labels, and bag-of-words features. RESULTS: The zero-shot learning model outperformed the baseline model in terms of area under the curve, sensitivity, specificity, and positive predictive value at multiple probability thresholds. In applying a 0.90 probability threshold, the methodology identified notes documenting suicidality but not associated with a relevant ICD-10-CM code, with 94% accuracy. CONCLUSION: This method can effectively identify suicidality without manual annotation.


Clinical Research and Leadership