Comparing reliability of perceptual ratings of roughness and acoustic measures of jitter
Journal of Speech and Hearing Research
Acoustic analysis is often favored over perceptual evaluation of voice because it is considered objective, and thus reliable. However, recent studies suggest this traditional bias is unwarranted. This study examined the relative reliability of human listeners and automatic systems for measuring perturbation in the evaluation of pathologic voices. Ten experienced listeners rated the roughness of 50 voice samples (ranging from normal to severely disordered) on a 75 mm visual analog scale. Rating reliability within and across listeners was compared to the reliability of jitter measures produced by several voice analysis systems (CSpeech, SoundScope, CSL, and an interactive hand-marking system). Results showed that overall listeners agreed as well or better than 'objective' algorithms. Further, listeners disagreed in predictable ways, whereas automatic algorithms differed in seemingly random fashions. Finally, listener reliability increased with severity of pathology; objective methods quickly broke down as severity increased. These findings suggest that listeners and analysis packages differ greatly in their measurement characteristics. Acoustic measures may have advantages over perceptual measures for discriminating among essentially normal voices; however, reliability is not a good reason for preferring acoustic measures of perturbation to perceptual measures.
Rabinov, C., Kreiman, J., Gerratt, B., & Bielamowicz, S. (1995). Comparing reliability of perceptual ratings of roughness and acoustic measures of jitter. Journal of Speech and Hearing Research, 38 (1). http://dx.doi.org/10.1044/jshr.3801.26