AI Workflow, External Validation, and Development in Eye Disease Diagnosis

Authors

Qingyu Chen, National Library of Medicine, National Institutes of Health, Bethesda, Maryland.
Tiarnan D. Keenan, National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Elvira Agron, National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Alexis Allot, National Library of Medicine, National Institutes of Health, Bethesda, Maryland.
Emily Guan, National Library of Medicine, National Institutes of Health, Bethesda, Maryland.
Bryant Duong, National Library of Medicine, National Institutes of Health, Bethesda, Maryland.
Amr Elsawy, National Library of Medicine, National Institutes of Health, Bethesda, Maryland.
Benjamin Hou, National Library of Medicine, National Institutes of Health, Bethesda, Maryland.
Cancan Xue, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
Sanjeeb Bhandari, National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Geoffrey Broadhead, Save Sight Institute, Sydney University, Sydney, Australia.
Chantal Cousineau-Krieger, National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Ellen Davis, Casey Eye Institute, Oregon Health & Science University, Portland.
William G. Gensheimer, VA White River Junction Healthcare System, White River Junction, Vermont.
Cyrus A. Golshani, Washington DC VA Medical Center, Washington, District of Columbia.
David Grasic, Carolina Vision Center, Fayetteville, North Carolina.
Seema Gupta, Casey Eye Institute, Oregon Health & Science University, Portland.
Luis Haddock, Bascom Palmer Eye Institute, University of Miami, Miami, Florida.
Eleni Konstantinou, National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Tania Lamba, Krieger Eye Institute, Baltimore, Maryland.
Michele Maiberger, Washington DC VA Medical Center, Washington, District of Columbia.
Dimosthenis Mantopoulos, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire.
Mitul C. Mehta, Gavin Herbert Eye Institute, University of California, Irvine.
Ayman G. Elnahry, National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Mutaz Al-Nawaflh, National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Arnold Oshinsky, Washington DC VA Medical Center, Washington, District of Columbia.
Brittany E. Powell, Fort Belvoir Community Hospital, Fort Belvoir, Virginia.
Boonkit Purt, Uniformed Services University of the Health Sciences, Bethesda, Maryland.
Soo Shin, Washington DC VA Medical Center, Washington, District of Columbia.
Hillary Stiefel, Casey Eye Institute, Oregon Health & Science University, Portland.
Alisa T. Thavikulwat, National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Keith James Wroblewski, George Washington University Hospital, The George Washington University, Washington, District of Columbia.

Document Type

Journal Article

Publication Date

7-1-2025

Journal

JAMA network open

Volume

8

Issue

7

DOI

10.1001/jamanetworkopen.2025.17204

Abstract

IMPORTANCE: Timely disease diagnosis is challenging due to limited clinical availability and growing burdens. Although artificial intelligence (AI) has shown expert-level diagnostic accuracy, a lack of downstream accountability, including workflow integration, external validation, and further development, continues to hinder its clinical adoption. OBJECTIVE: To address gaps in the downstream accountability of medical AI through a case study on age-related macular degeneration (AMD) diagnosis and severity classification. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study developed and evaluated an AI-assisted diagnostic and classification workflow for AMD. Four rounds of diagnostic assessments (accuracy and time) were conducted with 24 clinicians from 12 institutions. Each round was randomized and alternated between manual (clinician diagnosis) and manual plus AI (clinician assisted by AI diagnosis), with a 1-month washout period. In total, 2880 AMD risk features were evaluated across 960 images from 240 Age-Related Eye Disease Study patient samples, both with and without AI assistance. For further development, the original DeepSeeNet model was enhanced into the DeepSeeNet+ model using 39 196 additional images from the US population and tested on 3 datasets, including an external set from Singapore. EXPOSURE: Age-related macular degeneration risk features. MAIN OUTCOMES AND MEASURES: The F1 score for accuracy (Wilcoxon rank sum test) and diagnostic time (linear mixed-effects model) were measured, comparing manual vs manual plus AI. For further development, the F1 score (Wilcoxon rank sum test) was again used. RESULTS: Among 240 patients (mean [SD] age, 68.5 [5.0] years; 127 female [53%]), AI assistance significantly improved accuracy for 23 of 24 clinicians, increasing the mean F1 score from 37.71 (95% CI, 27.83-44.17) to 45.52 (95% CI, 39.01-51.61), with some improvements exceeding 50%. Manual diagnosis initially took an estimated 39.8 seconds (95% CI, 34.1-45.6 seconds) per patient, whereas manual plus AI saved 10.3 seconds (95% CI, -15.1 to -5.5 seconds) and remained faster by 6.9 seconds (95% CI, 0.2-13.7 seconds) to 8.6 seconds (95% CI, 1.8-15.3 seconds) in subsequent rounds. However, combining manual and AI did not always yield the highest accuracy or efficiency, underscoring challenges in explainability and trust. The DeepSeeNet+ model performed better in 3 test sets, achieving a significantly higher F1 score than the Singapore cohort (52.43 [95% CI, 44.38-61.00] vs 38.95 [95% CI, 30.50-47.45]). CONCLUSIONS AND RELEVANCE: In this diagnostic study, AI assistance was associated with improved accuracy and time efficiency for AMD diagnosis. Further development is essential for enhancing AI generalizability across diverse populations. These findings highlight the need for downstream accountability during early-stage clinical evaluations of medical AI.

Department

Ophthalmology

Share

COinS