«Journal of Phonetics (2002) 30, 139–162 doi:10.1006/jpho.2002.0176 Available online at on The phonetics of phonological ...»
Duration may also interact with voicing. During a fricative, pressure buildup behind the oral constriction equalizes transglottal pressure, making voicing more diﬃcult. Thus, long duration voiced fricatives may have a stronger tendency to devoice. However, the connection is not an absolute one, as it is perfectly possible to maintain voicing during a long fricative. Though rare, voiced geminate fricatives are attested in the world’s languages (e.g., Italian). Note also that this dependence only predicts that long /z/ will devoice, not that a very long fricative is necessarily entirely voiceless. We feel that the dependence between acoustic properties measured in our study cannot wholly be explained by aerodynamic interdependence between our measures, and variation is to some extent under the active control of the talker.
Thus, there is room for more abstract levels of linguistic organization to inﬂuence the interrelation of our measures.
The phonetics of phonological speech errors
3. Categorical, gradient, and ungrammatical errors
In the previous section, it was demonstrated that the three dimensions of duration, frication amplitude, and percent voicing diﬀerentiate /s/ and /z/ for our participants.
Of these, the percent voicing measure most directly reﬂects the voicing contrast.
There is considerable overlap between /s/ and /z/ in the distribution of frication amplitude and duration, even among tokens that were perceived to be correctly produced. As a result, the eﬀects of frication amplitude and duration are likely to be secondary in inﬂuencing the percept, and less deﬁnitive in determining when an error has occurred (Baum & Blumstein, 1987). In this section, we ﬁrst investigate variation in percent voicing, and then consider the variation in frication amplitude and duration. Finally, we brieﬂy discuss the other disﬂuent productions found in our corpus. Overall, the presence of these strange utterances and the considerable variability in production even for percent voicing supports the claim that many ungrammatical speech error outputs occur (Mowrey & MacKay, 1990).
3.1. Variation in percent voicing The percent voicing of all tokens for each participant is shown in Fig. 3. The data are grouped to illustrate productions that are usual and unusual for each participant. Both 0 and 100% voicing are given separately. The other groups are 0–5, 5–10, 10–30, 30–60, and 60–100%. Each panel of Fig. 3 shows the number of occurrences for one participant. Within each panel, the shaded bars indicate occurrences for intended /s/ and the clear bars indicate occurrences for intended /z/.
The data in Fig. 3 are summarized in Table I, where the aggregate number of occurrences for each percent voicing category for all participants is given for both /s/ and /z/.
Given that these tongue twisters were recorded in an experimental laboratory setting, we assume the participants were hyperarticulating and attempting to produce clear speech. Thus the the target percent voicing is likely to be 0% for /s/ and 100% for /z/. In normal speech, partial devoicing of /z/ results in productions that are voiceless in the middle, though these productions are usually still more than 60% voiced (Haggard, 1978; Smith, 1997). Fig. 3 and Table I show that many productions did not fall into the percent voicing groups that would normally be predicted for /s/ and /z/. Utterances where the percent voicing was between 5 and 30% are certainly anomalous for both /s/ and /z/. These productions with intermediate amounts of voicing provide evidence that gradient, noncategorical errors are made (Mowrey & MacKay, 1990).
Overall there are fewer cases of intermediate voicing for /s/ than for /z/.2 Among the utterances with intermediate voicing, /s/ also shows a clear pattern of variability.
As the percent voicing increases, fewer and fewer tokens of /s/ are found that have that degree of voicing. This is the pattern we might expect for phonetic variability if Participant 6 is a curious exception to the generalizations about the likelihood of gradient errors in /s/ and /z/. For participant 6, /s/ is more variable, and was overlapped with the following vowel (and the preceding vowel if there was one) a small amount in many utterances. On the other hand, /z/ for participant 6 is much less variable, with almost all instances realized with 100% voicing. Participant 6 is also atypical in the production of categorical errors, and does not follow the /s/ or /z/ pattern of the majority of the participants as discussed below.
150 S. A. Frisch and R. Wright Figure 3. Distribution of tokens among percent voicing groups for /s/ and /z/ for each participant.
0% voicing is intended. Productions that are farther away from the intended are relatively less likely, as they lie on the tail of the distribution. By contrast, the devoicing of /z/ was relatively common and does not clearly reﬂect degrees of variation away from an intended production of 100% voicing. Levels of voicing between 5 and 100% occur with relatively equal frequency and there are no apparent systematic patterns. It may be that, since the devoicing of /z/ is common in normal productions, the data here are mixture of normally devoiced /z/ and cases where /z/ is partially devoiced in error.
Fig. 3 and Table I also provide distributional evidence for categorical errors.
Across all participants, there are 18 cases where /s/ was produced with more than 60% voicing (out of 397 total). Compared to the ﬁve cases with 30–60% voicing and 12 cases with 10–30% voicing, the number of completely voiced tokens is relatively large. For the individual data of six of nine participants the number of tokens with more than 60% voicing was greater than the number of tokens with 10–60% voicing. This suggests that in addition to the peak in the distribution of /s/ at 0% voicing, there is a second peak in the distribution of /s/ at 100% voicing. We interpret this second peak to be the result of categorical changes from intended /s/ to [z].
There is also phonetic evidence that the tokens of /s/ with greater than 60% voicing are categorical errors. These tokens have shorter duration and lower frication amplitude than the typical /s/. In addition, these tokens have overlap between the frication noise and the onset of the formant structure of the following vowel that is typical of /z/ in our data. Finally, these fricatives have amplitude envelopes that gradually increase from the amplitude of the fricative toward the amplitude peak of the following vowel, which is also typical for /z/ in our data.
There are 56 cases where /z/ was produced with 0% voicing (out of 435 total).
The number of completely voiceless cases is larger than the number of cases with low levels of intermediate voicing (0–10%) for the individual data of seven of the nine participants. In about half of these cases, these also appear upon inspection to be categorical errors, with /z/ produced as [s]. In these cases, there is /s/-like duration and frication amplitude. There is a small positive VOT between the oﬀset of /z/ and the onset of voicing and formant structure for the following vowel. There is also an abrupt decrease in frication amplitude of the /z/ followed by an abrupt rise in vowel amplitude to the vowel amplitude peak that is characteristic of /s/ in our data. However, there are other cases of 0% voicing that lack these additional /s/-like features where the result may be better transcribed as ½z.
One example of an apparent categorical intended /z/ to [s] error is given in Fig. 4.
The top panel of Fig. 4 shows [s] in the production of zap by participant 5. The bottom panel shows an error-free production of sap by the same participant. In both utterances, there is a relatively large amount of frication noise, and the oﬀset of the fricative is complete before the onset of the following vowel begins.
An example of an error where intended /z/ is realized as ½z is given in Fig. 5. The top panel of Fig. 5 shows ½z in the production of zap by participant 9. The bottom panel of the ﬁgure shows an error-free production of sap by the same participant. In contrast with the categorical example above, the ½z has a much lower frication amplitude, and the frication noise oﬀset is at about the same time as the vowel onset. So while there is no measurable voicing during the fricative, there is no delay 152 S. A. Frisch and R. Wright Figure 4. Example categorical error in the production of zap (top panel) and normal production of sap (bottom panel) for participant 5.
Figure 5. Example devoicing error in the production of zap (top panel) and normal production of zap (bottom panel) for participant 9.
in voice onset as there is with /s/. Thus, it appears that even though there is no voicing, this error is not a categorical change from /z/ to [s].
The variation in percent voicing suggests that more errors (whether gradient or categorical) were from /z/ toward [s] than from /s/ toward [z]. This pattern is expected, as /s/ is higher in frequency of occurrence than /z/ and in general lowfrequency items are replaced by high-frequency items in speech errors (Stemberger, 1983). However, it has been reported in the speech error literature that /s/ and /z/ errors are one of several cases where the normal frequency eﬀect is reversed (Stemberger, 1991). Stemberger found that /s/ to [z] errors were more common than /z/ to [s] errors in a transcribed speech error experiment designed to examine asymmetries in segmental errors.
The phonetics of phonological speech errors Though the phonetic variability suggests that /z/ errors are more common than /s/ errors, variability in voicing in /z/ does not aﬀect the perception of /z/ nearly as much as it aﬀects the perception of /s/. Fig. 6 shows our judgement of the error rate based on our percept of each token. Tokens are aggregated into the same percent voicing groups as above, and the percent that were perceived as errors is plotted against the average percent voicing of tokens in the group. Our judgement of the percept was determined informally by consensus, based on repeated listening through headphones at a computer workstation. We found our level of agreement on the percept to be high. A token was considered to be an error if it was an /s/ that was not perceived as [s] or if it was a /z/ that was not perceived as [z]. While we did perceive some cases as ½z, many cases that would be best described as ½z were not perceived to be anomalous. These cases were only identiﬁed as ½z by visual inspection of the waveform. In the perceptual data, only the cases of perceived ½z were counted as errors.
Fig. 6 shows a pattern of error detection that is strikingly similar to categorical perception with a boundary in the vicinity of 5% voicing (Liberman, 1997). Note that there is a heavy asymmetry in the perception of errors in /s/ and /z/ productions. While an /s/ with almost any voicing was perceived as [z], few cases of devoiced /z/ were perceived to be [s] or ½z. Even when there was no voicing at all in the /z/, it often sounded normal. Thus, while the production data suggest that there were more anomalies in the /z/ tokens than the /s/ tokens, the perception of this variation was heavily biased to detect /s/ errors but not /z/ errors. This perceptual asymmetry explains the apparent reversal of the frequency eﬀect that was found by Stemberger (1991) and provides further evidence that speech error transcription is an unreliable method for examining the speech production process.
3.2. Variation in frication amplitude and duration Unusual variation in percent voicing reveals that there are many gradient errors in the production of /s/ and /z/ in our corpus. It is less clear how the dimensions of frication amplitude and duration might reveal gradient errors, since these dimensions are not fully contrastive. Even though the distributions of frication amplitude and
duration do not categorically diﬀerentiate /s/ from /z/, they do provide secondary cues to the voicing distinction (Cole & Cooper, 1975; Shadle, 1985). We have shown that tokens with 30% or more voicing were reliably perceived as [z]. Cases with less voicing were much less reliably categorized. Closer examination of the data reveal variation in frication amplitude and duration, in addition to voicing, aﬀected the perception of errors. In particular, for tokens with small amounts of voicing, high frication amplitude and long duration were cues for /s/, while low frication amplitude and short duration were cues for /z/.
Table II shows mean frication amplitude and duration for /s/ and /z/ depending on whether they were perceived as errors or not. The top portion of the table presents aggregate data for tokens with 0–5% voicing, the middle portion presents aggregate data for tokens with 5–30% voicing, and the bottom portion presents aggregate date for tokens with 30–100% voicing. While very few /s/ with 0–5% voicing were perceived as errors, those that were appear to have lower frication amplitude and shorter duration than those that were not (for amplitude t(347)=1.7, p=0.09; for duration t(347)=1.6, p=0.06). For /z/ with 0–5% voicing, the pattern was similar. Tokens with high frication amplitude and long duration were perceived as [s], those with low frication amplitude and short duration were perceived as [z] (for amplitude t(62)=4.0, po0.01; for duration t(62)=2.5, po0.01). For /s/ and /z/ with 5–30% voicing, frication amplitude and duration again appear to have an eﬀect on whether the tokens were perceived as errors or not, though the only statistically signiﬁcant diﬀerence is for frication amplitude for /s/ (t(22)=2.9, po0.01). Finally, in cases of 30–100% voicing, there were no tokens that were not perceived as [z], so TABLE II. Mean frication amplitude and duration for low, intermediate, and high percent voicing groups as a function of intended segment and perceived error for all participants
it appears that variation in frication amplitude and duration are not suﬃciently powerful cues to overwhelm the perceptual eﬀect of periodicity.