Surgery Faculty Publications

Multimodal attention network for trauma activity recognition from spoken language and environmental sound

Yue Gu, Rutgers University–New Brunswick
Ruiyu Zhang, Rutgers University–New Brunswick
Xinwei Zhao, Rutgers University–New Brunswick
Shuhong Chen, Rutgers University–New Brunswick
Jalal Abdulbaqi, Rutgers University–New Brunswick
Ivan Marsic, Rutgers University–New Brunswick
Megan Cheng, Childrens National Health System
Randall S. Burd, Childrens National Health System

Document Type

Conference Proceeding

Publication Date

6-1-2019

Journal

2019 IEEE International Conference on Healthcare Informatics, ICHI 2019

DOI

10.1109/ICHI.2019.8904713

Keywords

Environmental sound; Multimodal attention network; Spoken language; Trauma activity recognition

Abstract

© 2019 IEEE. Trauma activity recognition aims to detect, recognize, and predict the activities (or tasks) during trauma resuscitation. Previous work has mainly focused on using various sensor data including image, RFID, and vital signals to generate the trauma event log. However, spoken language and environmental sound, which contain rich communication and contextual information necessary for trauma team cooperation, are still largely ignored. In this paper, we propose a multimodal attention network (MAN) that uses both verbal transcripts and environmental audio stream as input; the model extracts textual and acoustic features using a multi-level multi-head attention module, and forms a final shared representation for trauma activity classification. We evaluated the proposed architecture on 75 actual trauma resuscitation cases collected from a hospital. We achieved 71.8% accuracy with 0.702 F1 score, demonstrating that our proposed architecture is useful and efficient. These results also show that using spoken language and environmental audio indeed helps identify hard-to-recognize activities, compared to previous approaches. We also provide a detailed analysis of the performance and generalization of the proposed multimodal attention network.

APA Citation

Gu, Y., Zhang, R., Zhao, X., Chen, S., Abdulbaqi, J., Marsic, I., Cheng, M., & Burd, R. (2019). Multimodal attention network for trauma activity recognition from spoken language and environmental sound. 2019 IEEE International Conference on Healthcare Informatics, ICHI 2019, (). http://dx.doi.org/10.1109/ICHI.2019.8904713

This document is currently not available here.

COinS

Surgery Faculty Publications

Multimodal attention network for trauma activity recognition from spoken language and environmental sound

Document Type

Publication Date

Journal

DOI

Keywords

Abstract

APA Citation

Search

Browse

Author Corner

Links

Surgery Faculty Publications

Multimodal attention network for trauma activity recognition from spoken language and environmental sound

Authors

Document Type

Publication Date

Journal

DOI

Keywords

Abstract

APA Citation

Share

Search

Browse

Author Corner

Links