Joining the conversation: introducing a dedicated medical education corpus
Document Type
Journal Article
Publication Date
1-1-2026
Journal
Academic medicine : journal of the Association of American Medical Colleges
Volume
101
Issue
1
DOI
10.1093/acamed/wvaf008
Keywords
bibliometrics; information retrieval; machine learning; medical education; scholarship
Abstract
PROBLEM: Medical education scholars struggle to join ongoing conversations in their field due to the lack of a dedicated medical education corpus. Without such a corpus, scholars must search too widely across thousands of irrelevant journals or too narrowly by relying on PubMed's Medical Subject Headings (MeSH). In tests conducted for this study, MeSH missed 34% of medical education articles. APPROACH: From January to December 2024, the authors developed the Medical Education Corpus (MEC), the first dedicated collection of medical education articles, through a 3-step process. First, using the core-periphery model, they created the Medical Education Journals (MEJ), a collection of 2 groups of journals based on participation and influence in medical education discourse: the MEJ-Core (formerly the MEJ-24, 24 journals) and the MEJ-Adjacent (127 journals). Second, they developed and evaluated a machine learning model, the MEC Classifier, trained on 4,032 manually labeled articles to identify medical education content. Third, they applied the MEC Classifier to extract medical education articles from the MEJ-Core and MEJ-Adjacent journals. OUTCOMES: As of December 2024, the MEC contained 119,137 medical education articles from the MEJ-Core (54,927 articles) and MEJ-Adjacent journals (64,210 articles). In an evaluation using 1,358 test articles, the MEC Classifier demonstrated significantly improved sensitivity compared with MeSH (90% vs 66%, P = .001), while maintaining a similar positive predictive value (82% vs 81%). NEXT STEPS: The MEC provides a focused corpus that enables medical education scholars to more easily join conversations in the field. Scholars can rely on the MEC when reviewing literature to frame their work, and the MEC also creates opportunities for field-wide analyses and meta-research. The core methodology also underlies the MedEdMentor Paper Database (mededmentor.org), a separately maintained online tool that complements the versioned MEC snapshot with a web-based search interface.
APA Citation
Ow, Gregory M.; Stetson, Geoffrey V.; Costello, Joseph A.; Artino, Anthony R.; and Maggio, Lauren A., "Joining the conversation: introducing a dedicated medical education corpus" (2026). GW Authored Works. Paper 8634.
https://hsrc.himmelfarb.gwu.edu/gwhpubs/8634
Department
Health, Human Function, and Rehabilitation Sciences