Reliability and Validity of 2 Surgical Prioritization Systems for Reinstating Nonemergent Benign Gynecologic Surgery during the COVID-19 Pandemic

Cherie Q. Marfori, Departments of Obstetrics and Gynecology
Jordan S. Klebanoff, Departments of Obstetrics and Gynecology
Catherine Z. Wu, Departments of Obstetrics and Gynecology
Whitney A. Barnes, Departments of Obstetrics and Gynecology
Charelle M. Carter-Brooks, Departments of Obstetrics and Gynecology
Richard L. Amdur, Departments of Obstetrics and Gynecology


© 2020 AAGL Study Objective: Scientifically evaluate the validity and reproducibility of 2 novel surgical triaging systems, as well as offer modifications to the Medically-Necessary, Time-Sensitive (MeNTS) criteria for improved application in gynecologic surgeries. Design: Retrospective cohort study. Setting: Academic university hospital. Patients: Ninety-seven patients with delayed benign gynecologic procedures owing to the coronavirus disease 2019 pandemic. Intervention(s): Surgical prioritization was assessed using 2 novel scoring systems, the Gynecologic Medically-Necessary Time-Sensitive (Gyn-MeNTS) and modified Elective Surgery Acuity Scale (mESAS) systems for all 93 patients included. Measurements and Main Results: The interrater reliability and validity of 2 novel surgical prioritization systems (Gyn-MeNTS and mESAS) were assessed. The Gyn-MeNTS scores were calculated by 3 raters and analyzed as continuous variables, with a lower score indicating more urgency/priority. The mESAS score was calculated by 2 raters and analyzed as a 3-level ordinal variable with a higher score indicating more urgency/priority. All 5 raters were blinded to reduce bias. The Gyn-MeNTS interrater reliability was tested using Spearman r and paired t tests were used to detect systematic differences between raters. Weighted κ indicated mESAS reliability. Concurrent validity with mESAS and surgeon self-prioritization (SSP) was examined with Spearman r and logistic regression. Spearman r's for all Gyn-MeNTS rater pairs were above 0.80 (0.84 for 1 vs 2; 0.82 for 1 vs 3; and 0.82 for 2 vs 3, all p <.001) indicating strong agreement. The weighted κ for the 2 mESAS raters was 0.57 (95% confidence interval, 0.40–0.73) indicating moderate agreement. When used together, both scores were significantly independently associated with SSP, with strong discrimination (area under the curve, 0.89). Conclusion: Interrater reliability is acceptable for both scoring systems, and concurrent validity of each is moderate for predicting SSP, but discrimination improves to a high level when they are used together.