This thesis examines disfluencies (e.g., 'um', repeated words, and a
variety of forms of self-repair) in the spontaneous speech of adult normal speakers of
American English. Despite their prevalence, disfluencies have traditionally been viewed as
irregular events and have received little attention. The goal of the thesis is to provide
evidence that, on the contrary, disfluencies show remarkably regular trends in a number of
dimensions. These regularities have consequences for models of human language production;
they can also be exploited to improve performance in speech applications. The
method includes analysis of over 5000 hand-annotated disfluencies from a database (250,000
words) containing three different styles of spontaneous speech: task-oriented
human-computer dialog, task-oriented human-human dialog, and human-human conversation on a
prescribed topic. The approach is theory-neutral and strongly data-driven. The annotations
correspond to observable characteristics (features) in the data, including: 1) the speech
domain; 2) the speaker; 3) the sentence in which a disfluency occurs; 4) word-related
characteristics of the disfluency; and 5) simple acoustic characteristics of the
disfluency. A methodology is developed for representing these features in a database
format, and an algorithm is provided for automatic disfluency type classification based on
this representation.
Results show regular trends in disfluency rates by sentence length, by disfluency
position, by presence of another disfluency in the same sentence, by disfluency type, and
by combinations of these features both across and within speakers. Regularities are also
found for word-related features of the disfluency, including the number of excised words,
the rate of cut-off words, and the rate of editing phrases. Additional analyses describe
characteristics of overlapping disfluencies and prosodic characteristics of the simplest
disfluency types. Across analyses, data from the three different speech styles are
compared; where relevant, simpleparametric models are provided.
In sum, disfluencies show regularities in a variety of dimensions. These
regularities can help guide and constrain models of spoken language production. In
addition they can be modeled in applications to improve the automatic processing of
spontaneous speech. |