Reliability and Validity of 2 Surgical Prioritization Systems for Reinstating Nonemergent Benign Gynecologic Surgery during the COVID-19 Pandemic

Document Type

Journal Article

Publication Date



Journal of Minimally Invasive Gynecology




Case management; Coronavirus; Delivery of healthcare; Elective surgical procedures; Triage


© 2020 AAGL Study Objective: Scientifically evaluate the validity and reproducibility of 2 novel surgical triaging systems, as well as offer modifications to the Medically-Necessary, Time-Sensitive (MeNTS) criteria for improved application in gynecologic surgeries. Design: Retrospective cohort study. Setting: Academic university hospital. Patients: Ninety-seven patients with delayed benign gynecologic procedures owing to the coronavirus disease 2019 pandemic. Intervention(s): Surgical prioritization was assessed using 2 novel scoring systems, the Gynecologic Medically-Necessary Time-Sensitive (Gyn-MeNTS) and modified Elective Surgery Acuity Scale (mESAS) systems for all 93 patients included. Measurements and Main Results: The interrater reliability and validity of 2 novel surgical prioritization systems (Gyn-MeNTS and mESAS) were assessed. The Gyn-MeNTS scores were calculated by 3 raters and analyzed as continuous variables, with a lower score indicating more urgency/priority. The mESAS score was calculated by 2 raters and analyzed as a 3-level ordinal variable with a higher score indicating more urgency/priority. All 5 raters were blinded to reduce bias. The Gyn-MeNTS interrater reliability was tested using Spearman r and paired t tests were used to detect systematic differences between raters. Weighted κ indicated mESAS reliability. Concurrent validity with mESAS and surgeon self-prioritization (SSP) was examined with Spearman r and logistic regression. Spearman r's for all Gyn-MeNTS rater pairs were above 0.80 (0.84 for 1 vs 2; 0.82 for 1 vs 3; and 0.82 for 2 vs 3, all p <.001) indicating strong agreement. The weighted κ for the 2 mESAS raters was 0.57 (95% confidence interval, 0.40–0.73) indicating moderate agreement. When used together, both scores were significantly independently associated with SSP, with strong discrimination (area under the curve, 0.89). Conclusion: Interrater reliability is acceptable for both scoring systems, and concurrent validity of each is moderate for predicting SSP, but discrimination improves to a high level when they are used together.