Comparison of Imputation Strategies for Incomplete Longitudinal Data in Lifecourse Epidemiology

Document Type

Journal Article

Publication Date



American journal of epidemiology




Fully-conditional specification; Health and Retirement Study; Joint modelling; Longitudinal data; Missing Not at Random; Multiple Imputation by Chained Equations; Multiple imputation; Predictive Mean Matching


Incomplete longitudinal data are common in lifecourse epidemiology and may induce bias leading to incorrect inference. Multiple imputation (MI) is increasingly preferred for handling missing data, but few studies explore MI method performance and feasibility in real data settings. We compared three MI methods using real data under nine missing data scenarios, representing combinations of 10%, 20%, and 30% missingness and missing completely at random, at random, and not at random. Using data from Health and Retirement Study (HRS) participants, we introduced record-level missingness to a sample of participants with complete data on depressive symptoms (1998-2008), mortality (2008-2018), and relevant covariates. We then imputed missing data using three MI methods (normal linear regression, predictive mean matching, variable-tailored specification), and fit Cox proportional hazards models to estimate effects of four operationalizations of longitudinal depressive symptoms on mortality. We compared bias in hazard ratios, root mean square error (RMSE), and computation time for each method. Bias was similar across MI methods and results were consistent across operationalizations of the longitudinal exposure variable. However, our results suggest predictive mean matching may be an appealing strategy for imputing lifecourse exposure data given consistently low RMSE, competitive computation times, and few implementation challenges.