Statistical
language modeling for speech disfluencies
|
Andreas Stolcke and Elizabeth Shriberg
|
|
Abstract |
Speech disfluencies (such as filled pauses, repetitions, restarts)
are among the characteristics distinguishing spontaneous speech from planned or read
speech. We introduce a language model that predicts disfluencies probabilistically and
uses an edited, fluent context to predict following words. The model is based on a
generalization of the standard N-gram language model. It uses dynamic programming to
compute the probability of a word sequence, taking into account possible hidden disfluency
events. We analyze the model performance for various disfluency types on the Switchboard
corpus. We find that the model reduces word perplexity in the neighborhood of disfluency
events; however, overall differences are small and have no significant impact on
recognition accuracy. We also note that for modeling of the most frequent type of
disfluency, filled pauses, a segmentation of utterances into linguistic (rather than
acoustic) units is required. Our analysis illustrates a generally useful technique for
language model evaluation based on local perplexity comparisons. |
|
|
Stolcke,
A. & E. Shriberg 1996 Statistical language modeling for speech disfluencies.
In Proceedings of the International Conference on Acoustics, Speech and Signal
Processing 1: 405-408. Atlanta, GA, 7-10 May. |