Outlier Analysis

 

Outlier Analysis

One problem in large-scale analysis of this sort is that erroneous measurements may slip by, due to insufficient attention. Much of the phonetic variation to documented herein is quite extreme, much more extreme than what is found in monitored ``laboratory'' speech. A skeptic could wish to attribute some of the apparently unusual measurements to mistakes of measurement. This section attempts to assuage the doubts of such skeptics, first by showing how apparent errors (outliers) were found, and second by showing how most of these apparent errors are in fact correctly measured phonetic forms whose characteristics result from natural phonetic and linguistic influences.

The error-detection procedure followed is to plot an F1-F2 chart on a computer screen, displaying the formant-measurements of a single vowel class and speaker at a time, and then to identify any gross outliers. A button-press on a (mouse) graphic pointing device next to an outlying data point on the display reveals the identity of that outlier. For each outlier, the waveform, spectrogram, formant-tracks, and segmentation times are redisplayed, listened to, and checked for mistakes.

In most cases, the outliers turn out to be correct measurements - a function of the considerable care used in step 10 of the analysis procedure above. Thus of the 13 most distant outliers identified in the 2329 measured tokens for the 2nd Chicago speaker, Jim C., 2 were errors due to a phonological misclassification of // as /æ/ in the words can and and, while 11 were correct measurements. None were actual errors of measurement. Among 20 and 17 outliers examined for the Jamaican speakers, 2 tokens each were found to be actual errors, while the others were due to extreme assimilation to adjacent sounds or due to extreme stress. When in the great majority of cases the most distant outliers are actually correct measurements, then the less-outlying tokens are even less likely to contain gross errors. Gross errors appear as outliers, and the remaining tokens are relatively consistent with each other and less outlying. The most extreme outliers are only rarely errors. Variation is in fact extreme, as is shown, for example, in Figure [*], page [*]. Thus although some small residue of errors undoubtedly remains, this data is fairly clean.

How can outliers be correct? The following list of outliers in the speech of Jim (from Chicago) discusses why these 11 tokens were phonetically unusual, and gives the reader something of a flavor of the kind of striking but regular acoustic variation that occurs in natural vernacular speech.

  • 120 /ow/ is very low. "No!", /now/, [nao]: This carries emphatic, contrastive intonation. It's not "nah"; this is clearly the word ``No'', because of the offglide [o].
  • 201 /ay/ is well to the front of the other /ay/ tokens. ``at night'', /æt#nayt/: The preceding front /æ/ seems to pull this token to the front.
  • 794 /æ/ is extremely low and back. ``blank'' /blæk/, primary stressed. F2 glides forward from the preceding /l/ into the following front /k/; the nucleus is chosen, as usual, at the F1 maximum. Following velars have a strong retarding effect on the raising of /æ/, in Chicago just as in New York City and Philadelphia. This effect appears magnified in this doubly lengthening environment: it is stressed, and also precedes a voiced tautosyllabic consonant.
  • 1620, 1817 /uw/ very front. Both occur in ``I threw the'' (paper, bottle) /ruw#/: // is deleted. /r/ is realized as a palatal affricate, and the entire realization of /uw/ is thereby strongly fronted.
  • 1891 // is high and back. ``I knew if I lasted'', /nuw#f/: // in the clitic form ``if''(cf., ``If you try it'' = ``Few try it'') is reduced to mere length on the preceding /uw/. Its phonetic timing slot is not apparently lost, but its phonetic quality is entirely due to the adjacent stressed /uw/. This case could be analysed as compensatory lengthening of /uw/, or as feed-forward coarticulation.
  • 1919 /uw/ is very low and front. ``I knew I didn't'', /nuw# ay/. The preceding coronal has the typical American effect of fronting the nucleus of /uw/, realized as a quite high F2 onset, gradually falling throughout the vowel for about 70 ms in this case. The lowering which also occurs may be attributed to the influence of the following low nucleus of /ay/.
  • 2096 // is the lowest token of all; it's off the chart. ``... threw bricks at me and Holly'' /''hliy/. The // is emphatically stressed, slightly creaky, but the formants are quite clear, and the preceding fully-realized /h/ (without oral constriction) ensures that this low vowel cannot be attributed to coarticulatory influences. The conversational context is a list of events in which the speaker was attacked; once he was with his friend, Holly, who is 6'2'', 200 pounds, and a person to be feared in a fight. Thus the affective stress on ``Holly'' symbolizes the irrational nature of the attack. The stress is realized by an extension of the mouth-opening gesture associated with // (thereby raising F1, cf. Chapter 2, Acoustics) beyond what is found for any other token of that or any other vowel.
  • 2330 /:/ is very front: ``He thought he was..'' /hiy#t#hiy/ /t#h/ is deleted. This realization of /:/ sounds lower mid, central, []. Here the surrounding high-front /iy/'s seem to have fronted this nucleus, though they didn't raise it.
  • 2466 /:/ is far lower and fronter than other tokens of //: ``All right'' /l+rayt/ // is the most stressed syllable in this utterance. This cannot be attributed to coarticulation with the following /ay/, since this // is even lower (but backer) than the following /ay/ nucleus. Instead this may be an effect of sound change in progress, where /:/ lowers and fronts (cf. the Chicago Loop Chain Shift, discussed in the Chicago chapter), apparently more so when stressed.
  • 2764 // is very low. ``I got him down'' /gt#Im/: /I/ in this clitic form has reduced qualitatively to a glide between /a/ and /m/, but has its own timing slot (measured at 90ms, which is quite long).

In this data, the formant-tracking was good to begin with, and the formant tracks were checked and corrected by hand for every vowel token. These examples show that extreme outliers on F1-F2 charts derived from these measurements are most often to be attributed not to errors of measurement, but to particular phonetic and linguistic circumstances that produce the variation in an understandable, even predictable, way. (Thus for example, not just one but both tokens of ``threw'' showed extreme fronting of /uw/.)

Freud said that to understand the normal, it is helpful to study the abnormal. The stressed tokens above which lay well outside the ``normal'' distribution of the respective vowel phonemes may be a window to the understanding of normal speech. In particular, it is clear that heavy stress can exaggerate articulatory gestures, giving rise to these outlying measurements. The hyper-open // in ``Holly'' remains //, but its qualities are not those of other //'s: the speaker's mouth was considerably more open. If the phonetic realization of // for this speaker is a particular acoustic target in formant-space, this token is not a proper //, because it lies well outside the normal range of formant-frequencies. But it is a proper realization of //, in this particular linguistic context. It may resolve this paradox to say that vowels are realized by a particular articulatory gesture, which may be overlaid onto other gestures, or magnified as a result of stress or lengthening. Then the form of the gesture (e.g., mouth-opening) may remain while parametric modifications to the gesture (duration, magnitude, relative timing with respect to other gestures) may be made through the influence of context, stress, etc. An acoustic analog of this articulatory ``gesture theory'' (developed in various papers by Browman & Goldstein, e.g., 1986) may also be possible.

In summary, acoustically unusual outliers are frequently natural, sometimes even predictable realizations of sound-classes in particular environments, which may be used for getting deeper insight into the nature of speech sounds. Errors of instrumental analysis, while undoubtedly present, cannot be used as explanations for the patterns of extreme variation found.