Models to Assess the Association of a Semiquantitative Exposure with Outcomes

Document Type

Journal Article

Publication Date



American Journal of Epidemiology








modeling exposure; semiquantitative covariate; smoking and smoking intensity; variable with spike at zero


A semiquantitative risk factor has 2 components: Any exposure (yes/no) and the quantitative amount of exposure (if exposed). We describe the statistical properties of alternative analyses with such a risk factor using linear, logistic, or Cox proportional hazards models. Often analyses employ the amount exposed as a single quantitative covariate, including the nonexposed with value zero. However, this analysis provides a biased estimate of the exposure coefficient (slope) and we describe the magnitude of the bias. This bias can be eliminated by adding a binary covariate for exposed versus not to the model. This 2-factor analysis captures the full risk-factor effect on the outcome. However, the coefficient for any exposure versus not does not have a meaningful interpretation. Alternatively, when exposure values among those exposed are centered (by subtracting the mean), the estimate of this coefficient represents the difference in the outcome between those exposed versus not in aggregate. We also show that the biased model provides biased estimates of the coefficients for other covariates added to the model. Proper analysis of a semiquantitative risk factor should start with a 2-factor model, with centering, to assess the joint contributions of the 2 components of the risk-factor exposure. Properties of models were illustrated using data from a multisite study in North America (1983-2019).