«Journal of Phonetics (2002) 30, 139–162 doi:10.1006/jpho.2002.0176 Available online at on The phonetics of phonological ...»
If we compare the fully voiced /z/ tokens with the /s/ tokens with 0–5% voicing that were perceived as errors, we see evidence that these voiceless /s/ tokens were produced with frication amplitude and duration appropriate for /z/. These /s/ tokens could reasonably be considered to be errors on the frication amplitude and duration dimensions but not on the voicing dimension. In other words, these may be cases where /s/ was produced as ½z. If this is the correct analysis, these tokens provide additional evidence for gradient errors in production.
3.3. Other variation 3.3.1. Concatenation errors There were 10 tokens, six for intended /s/ and four for intended /z/, where the speakers appeared to correct their articulations without interrupting their production. In all of these cases, the token begins as the unintended segment and is corrected to the intended segment. In all cases, the resulting concatenation of onset consonants is 300–500 ms long, and thus long enough to be equivalent to the production of two separate segments. The resulting [zs] or [sz] onset cluster is, of course, phonotactically illegal in English. Fig. 7 shows two examples of concatenation errors. The top panel shows [zs] in a production of sue by participant 6. The bottom panel shows [sz] (or perhaps ½z z) for a production of zig by participant 3.
3.3.2. Other disﬂuencies There are six other anomalous productions that are interesting to note. Three of these productions appear to be linguistically explicable speech errors, but not ones that were intended to be elicited by the experiment. All three cases involved Figure 7. Example concatenation errors producing [zs] for sue (top panel) and [sz] for zig (bottom panel).
156 S. A. Frisch and R. Wright anticipation of the palatal articulation from the coda of the intended word zilch onto the onset of the same word. In two cases, one by participant 6 and one by participant 9, the result was [W]. Inspection of the spectrograms of these two examples conﬁrms that a voiced fricative was produced with frication noise in the same region as the coda [pP]. Participant 6 interrupted his production mid-word and immediately repeated the word correctly. In addition to being an unusual anticipation, the phonotactic legality of the [W] onset is questionable in English.
The third palatal intrusion, also by participant 6, was combined with an error in voicing. In this case, the palatal fricative was largely voiceless and of low amplitude.
The frication noise overlapped with the onset of the following vowel, resulting in a short period of voiced frication. The perceptual result was [W]. In this case, the participant once again interrupted his production mid-word, and then repeated the word correctly after a 1 s delay.
The three other productions are a nonhomogenous set of disﬂuencies. These cases are perceptually anomalous and could be considered speech production errors that are gross gestural mis-coordinations. One case is a low amplitude ½z like sound that lasts for about 500 ms before becoming a relatively normal amplitude and duration [s] onset. A second involves an intended /s/ where the alveolar constriction appears to end about 100 ms before the onset of the following vowel, resulting in an intrusive [h] percept. The third involves an intended /z/ that begins with about 100 ms of unfricated prevoicing followed by a 300 ms long fully voiced [z] onset.
3.3.3. ‘Geminate’ /s/ As noted above, one of the tongue twisters contained the sequence Zeus seem in which a coda /s/ is followed by an onset /s/. There were 55 tokens of this sequence across all participants. In 40 cases, the two fricatives were produced without a discrete juncture. It is interesting to note that none of these long /s/ productions show signs of being voiced and none were perceived to be errors. In addition, the 15 cases where there were two distinct /s/ productions also show very little variation in voicing in the /s/ productions. Only one of the 15 distinct onset /s/ tokens has any voicing, and that case involves just 3% voicing from a small amount of overlap with the following vowel. Given that more than 15% of the other /s/ tokens contain some amount of erroneous voicing, the doubled articulation of /s/ appears to have somehow ‘‘protected’’ these segments from intrusion by /z/.
More general eﬀects of segmental context on voicing of /s/ and /z/ (as well as /f/ and /v/) were found by Pirello, Blumstein & Kurowski (1997). They examined the eﬀect of segmental context on the voicing of fricatives in nonsense phrases like his fav sips it. They found that voiceless fricatives following voiced segments were often initially voiced, and voiced fricatives following voiceless segments were often initially devoiced.
3.4. Summary The concatenation errors and the many cases of intermediate amounts of voicing found in all nine participants in our study supports the conclusions of Mowrey & MacKay (1990). A signiﬁcant number of tokens in our corpus are phonetically or phonotactically abnormal in English. Thus, it appears that gradient errors do occur The phonetics of phonological speech errors frequently, and the process of phonetic accommodation does not regularize all speech errors to be acoustically normal English sounds. We also found that the gradient errors might or might not be detected as errors by careful listening.
On the other hand, there is also evidence for categorical eﬀects in errors. Cases where /s/ and /z/ are realized with a categorical change in voicing are more common than we would expect if categorical changes in voicing were merely extreme examples of gradient voicing errors (i.e., the tails of the variation in the distribution of voicing). In addition, there is clear evidence that many cases of 0% voicing for /z/ also have duration and frication amplitude that is appropriate for /s/. It seems likely that these are instances of categorical errors, where a normal articulation appropriate to the wrong segment is produced. While there is some aerodynamic interdependence between voicing, frication amplitude, and duration, their interdependence is not so strong that a fricative of the length and amplitude of a normal /s/ must necessarily be completely voiceless. In fact, our corpus contains many productions of /z/ that have the length and frication amplitude of /s/ but are voiced throughout.
We conclude that a linguistic level of the segment or feature does have an inﬂuence on the phonetic details of speech error production. This eﬀect can be captured in models of language processing that use graded activation and competition between linguistic units to explain speech errors in phonological encoding (e.g., Dell, 1986). Errors occur when competition results in the accidental mis-selection of an incorrect word, segment, or feature. In order to account for gradient speech errors, these models must be extended to include activation and competition among phonetic articulatory plans. While it is certainly not a trivial task to develop a working connectionist model of phonetic implementation, the conceptual extension is a simple one. If there is activation and competition in phonetic planning, there is the possibility for articulatory plans that mix gestures for diﬀerent articulators from diﬀerent segments. Complete devoicing of /z/ with a ½z result could be explained by a combination of the glottal abduction of /s/ with normal pulmonic eﬀort associated with /z/. It is also conceivable that erroneous articulatory plans that blend articulatory parameters for the same articulator could be created. Partial voicing of /s/ could result from a combination of the articulatory gestures for /s/ with the timing of oral and laryngeal gestures for /z/ in which the fricative constriction is overlapped with the onset of voicing of the following vowel.
Activation and competition may also explain why ‘‘geminate’’ /s/ appeared to be unaﬀected by /z/. Elements of articulatory planning, such as the laryngeal abduction, may be more strongly encoded when onset and coda /s/ productions come together, as each segment would independently reinforce the activation of this gesture. Correct /s/ voicing articulation will be more successful in competing with /z/ as the result of this additional activation.
4. Lexical eﬀects
In the previous section, we argued that there are observable segmental eﬀects in the phonetics of speech errors. In our corpus, two of the tongue twisters were designed so that the elicited errors would result in words and two of the tongue twisters were designed so that the elicited errors would result in nonwords. We, therefore, also 158 S. A. Frisch and R. Wright have the opportunity to examine whether the productions were inﬂuenced by the presence of a word error outcome. In the speech error literature, it has been reported that word error outcomes are more likely than nonword outcomes in both naturally occurring error corpora and in experiments that elicit speech errors (Motley & Baars, 1975; Dell & Reich, 1980; Stemberger, 1983).
Table III shows the tokens divided into groups by percent voicing, as in Table I, but with the tokens further divided by the lexical status of the error outcome of the tongue twister. In the case of /s/, it appears that all types of error in voicing were more common in the case of lexical outcomes, and the two distributions deviate signiﬁcantly from a common distribution (w2(6)=19.7, po0.01). This provides clear evidence that the lexicality of the outcome aﬀects both gradient and categorical aspects of speech errors.
In the case of /z/, the pattern is less clear, though the distributions are signiﬁcantly diﬀerent (w2(6)=13.9, po0.05). It appears that the number of tokens with 0–30% voicing is somewhat higher in the lexical case than in the nonlexical case, and the pattern is reversed for voicing from 30% up to 100%. As mentioned in Section 3, it may be the case that the distribution of percent voicing for /z/ reﬂects a mixture of normal devoicing and error-induced devoicing. The number of cases with highly reduced voicing in the lexical case may reﬂect a greater amount of error-induced devoicing on those tokens. The fact that there are fewer lexical cases than nonlexical cases with smaller amounts of devoicing is more diﬃcult to explain.
One possibility is that normally devoiced productions of /z/ are somehow more susceptible to additional error-induced devoicing than cases where /z/ is 100% voiced. If error-induced devoicing is stronger in the lexical case, then the number of tokens with only a small amount of devoicing would be diﬀerentially reduced.
Given the clear lexical eﬀects on /s/ production and some indication of lexical eﬀects on /z/ production, we conclude that productions in which there is a competing lexical outcome produce greater numbers of errors. The eﬀect of a competing lexical item on speech errors is found in an increase in both gradient errors and categorical errors. Together with the evidence from Section 3 that categorical errors in voicing are more common than expected based on the pattern of gradient errors, we conclude that the segment and word are both units of organization in the speech production process that aﬀect the phonetic details of articulation.
The lexical eﬀect we found in speech production further supports a connectionist approach to phonological encoding in speech production. If the articulatory plans for segments compete with one another during encoding, then competing segments that are reinforced by corresponding word nodes will be enhanced. For competing word outcomes, more errors will occur. When there is no word competitor, the correct segment will encounter less competition, and fewer errors will occur.
Assuming that errors can involve both blending of articulatory plans and wholesale substitution, an increase in both gradient and categorical errors would be predicted in cases where there is a lexical competitor.
5. Discussion and conclusion
This study presented an acoustic and perceptual analysis of onset /s/ and /z/ speech errors by nine talkers. In support of the claims of Mowrey & MacKay (1990), we found that gradient, noncontrastive errors can occur, and that such errors are actually common. In addition, we found that categorical errors also occur at rates that are higher than would be expected if the only source of errors was from noncontrastive variation that happened to extend into another phonetic category.
Finally, we demonstrated a lexical eﬀect on both gradient and categorical errors.
These patterns provide evidence for a set of higher level units that organize phonetic gestures at the level of the segment and word, agreeing with some of the observations of traditional speech error analyses based on transcribed data.
While some of the theoretical conclusions of transcription-based speech error analyses are supported, others must be rejected. Assertions that speech errors result in grammatically acceptable utterances are not supported (e.g., Fromkin, 1971). The detailed acoustics of some of the productions in our corpus, speciﬁcally those resulting in intermediate voicing errors, are distinctly phonetically anomalous. For example, errors where 10% of an intended /s/ is voiced by overlapping with the following vowel are clearly anomalous, and by no means similar to cases of normally devoiced /z/. In addition, the concatenation errors are instances of phonotactically ungrammatical onset sequences. While self-monitoring and editing certainly takes place in speech production (Levelt, 1989), it is not the case that the majority of speech errors in our corpus are best analyzed as mis-selection of phonological segments that have been phonetically encoded after the error occurred.