DXM-TransFuse U-net: Dual cross-modal transformer fusion U-net for automated nerve identification

Document Type

Journal Article

Publication Date



Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society






Deep learning; Image segmentation; Medical imaging; Multi-modal fusion


Accurate nerve identification is critical during surgical procedures to prevent damage to nerve tissues. Nerve injury can cause long-term adverse effects for patients, as well as financial overburden. Birefringence imaging is a noninvasive technique derived from polarized images that have successfully identified nerves that can assist during intraoperative surgery. Furthermore, birefringence images can be processed under 20 ms with a GPGPU implementation, making it a viable image modality option for real-time processing. In this study, we first comprehensively investigate the usage of birefringence images combined with deep learning, which can automatically detect nerves with gains upwards of 14% over its color image-based (RGB) counterparts on the F2 score. Additionally, we develop a deep learning network framework using the U-Net architecture with a Transformer based fusion module at the bottleneck that leverages both birefringence and RGB modalities. The dual-modality framework achieves 76.12 on the F2 score, a gain of 19.6 % over single-modality networks using only RGB images. By leveraging and extracting the feature maps of each modality independently and using each modality's information for cross-modal interactions, we aim to provide a solution that would further increase the effectiveness of imaging systems for enabling noninvasive intraoperative nerve identification.