Statistical language modeling for speech disfluencies

Andreas Stolcke and Elizabeth Shriberg

Abstract

Speech disfluencies (such as filled pauses, repetitions, restarts) are among the characteristics distinguishing spontaneous speech from planned or read speech. We introduce a language model that predicts disfluencies probabilistically and uses an edited, fluent context to predict following words. The model is based on a generalization of the standard N-gram language model. It uses dynamic programming to compute the probability of a word sequence, taking into account possible hidden disfluency events. We analyze the model performance for various disfluency types on the Switchboard corpus. We find that the model reduces word perplexity in the neighborhood of disfluency events; however, overall differences are small and have no significant impact on recognition accuracy. We also note that for modeling of the most frequent type of disfluency, filled pauses, a segmentation of utterances into linguistic (rather than acoustic) units is required. Our analysis illustrates a generally useful technique for language model evaluation based on local perplexity comparisons.

Stolcke, A. & E. Shriberg 1996 Statistical language modeling for speech disfluencies. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing 1: 405-408. Atlanta, GA, 7-10 May.

Statistical language modeling for speech disfluencies

Andreas Stolcke and Elizabeth Shriberg

Key points relevant to the study of filled pauses

Comments