Journal of Phonetics (2002) 30, 139–162
doi:10.1006/jpho.2002.0176
The phonetics of phonological speech errors: An acoustic analysis of slips of the tongue

Journal of Phonetics (2002) 30, 139–162


The phonetics of phonological speech errors:

An acoustic analysis of slips of the tongue

Stefan A. Frisch

Department of Communication Sciences and Disorders, University of South Florida,

4202 E Fowler Ave, PCD1017, Tampa, FL 33620, U.S.A.

Richard Wright

Department of Linguistics, University of Washington, Box 354340, Seattle,

WA 98195-4340, U.S.A.

Received 22nd November 2000 and accepted 7th February 2002 Acoustic analysis was used to examine whether speech errors involve lexical, segmental, or sub-featural errors in speech production. Nine participants produced tongue twisters that induced errors between /s/ and /z/ word onsets in contexts where the error outcomes were either words (e.g., sit to zit) or nonwords (e.g., suck to *zuck). Three measurements of the /s/-/z/ contrast were made: (1) percent voicing, (2) duration of frication, and (3) amplitude of frication. The tokens were also transcribed under careful listening conditions. Gradient and categorical errors were found for all acoustic dimensions. The errors might or might not be detected by careful listening, depending on the extent to which there were errors along all three dimensions. These data support previous articulatory studies that found speech errors at a sub-featural level. However, cases where /s/ and /z/ are realized with a categorical change in voicing are more common than would be expected if categorical changes in voicing were merely extreme examples of gradient voicing errors. Also, both gradient and categorical error rates were higher when the error outcomes were words. Thus, our study also provides evidence for the psychological reality of phonological segments and words as units in the speech production process. r 2002 Elsevier Science Ltd. All rights reserved.

1. Introduction Speech errors have traditionally been used to provide evidence for models of speech production that utilize the constructs of linguistic theory as psychologically real components of linguistic performance (e.g., Levelt, 1989). While it is indisputable that speech errors do occur, few unambiguous conclusions about the mechanisms Address correspondence to: S. A. Frisch. E-mail: frisch@chuma1.cas.usf.edu 0095–4470/02/$ - see front matter r 2002 Elsevier Science Ltd. All rights reserved.

140 S. A. Frisch and R. Wright of speech production can be drawn from speech error data. In addition, several researchers have questioned the validity of phonological speech error data that has been recorded using phonetic transcription (Laver, 1980; Mowrey & MacKay, 1990;

Boucher, 1994; Ferber, 1995). In this paper, we undertake an acoustic analysis of speech produced by nine talkers in a speech error elicitation experiment. Acoustic analysis circumvents the problems of perceptual bias introduced by phonetic transcription. Our analysis provides evidence for the psychological reality of phonological segments in speech production as a statistical tendency, supporting transcriptional analyses. However, we also find evidence for speech errors at a subfeatural or gradient phonetic level that have not previously been attested. Our data support a model of speech production where individual gestures are organized into gestural constellations at the level of the segment (Saltzman & Munhall, 1989; Byrd, 1996). Segments are further organized into words. We find that the segment and word levels influence the implementation of gestures in both erroneous and errorfree productions.

1.1. Background Phonological speech errors (also called sub-lexical errors) have been an important source of evidence for the psychological reality of phonological features and segments. In many speech errors, it appears that portions of the intended utterance are produced in an unintended order. It is claimed in the speech error literature that the misordered portions correspond to


linguistic units such as onsets, codas, phonemes, segments, and features. The errors in (1) are given by Fromkin (1971) as support for the psychological reality of segments, distinct from words or syllable onsets.

(1) a. frish gotto ‘‘fish grotto’’ b. blake fruid ‘‘brake fluid’’ c. spicky point ‘‘sticky point’’ In (1a), the /r/ in grotto is presumably misordered, and appears as part of the preceding word instead. In (1b) the /l/ and /r/ are exchanged, each appearing in the others’ place. In (1c), the /p/ (or alternatively, the [+labial] feature) is anticipated, but also repeated in its proper place. As with (1c), it is often the case that any particular error can be interpreted in more than one way. Another error from Fromkin (1971), glear plue sky for clear blue sky, is claimed to involve an exchange of the voicing feature. This demonstrates that errors involving linguistic features are possible and thus that features are psychologically real units of processing as well.

Mowrey & MacKay (1990), using electromyographic (EMG) recordings of tongue twister production conclude ‘‘that errors which have been consigned to the phonemic, segmental, or feature levels could be reinterpreted as errors at the motor output level’’ (p. 1311). In the remainder of this section, we review the transcriptional methods of speech error analysis and the results of the instrumental study of Mowrey & MacKay (1990).

The phonetics of phonological speech errors

1.2. Transcriptional approach

Traditional approaches to speech error analysis use phonetic transcription to encode speech errors at the time they are heard. In ‘‘naturally occurring’’ speech error corpora, errors that are observed in everyday speech are written down opportunistically. In the early corpora (e.g., Fromkin, 1971; Shattuck, 1975) the error recorders were usually participants in the communicative event in which the error occurred.

Stemberger (1983) collected naturally occurring errors only as an observer in an attempt to reduce the potential for perceptual bias. In some cases, recordings of naturally occurring speech are used, and suspected errors are listened to repeatedly to ensure accurate transcription (e.g., Garnham, Shillcock, Brown, Mill & Cutler, 1982). Transcription is also normally used to encode errors in speech error elicitation experiments (e.g., Baars, Motley & MacKay 1975; Dell & Reich, 1980), though usually the utterances themselves are recorded on tape or computer and listened to repeatedly.

In all cases where transcription is used, the noting of a speech error necessarily coincides with the hearer noticing an anomalous percept. Thus, in transcriptional approaches, a speech error is defined to be an utterance that produces an anomalous percept that would be recognized as anomalous by the speaker (Dell, 1986). Mowrey & MacKay (1990, p. 1299) note that imperceptible speech errors may also exist and claim that ‘‘such production anomalies are errors if speech output differs from the speaker’s intended output, however subtle the anomaly’’. Their claim raises the question of how articulatorily detailed the speaker’s intentions are, which we discuss below.

Transcribed speech error evidence has been used to argue in favor of the psychological reality of many phonological units, including the feature, segment, phoneme, cluster, syllable, and word. Among sub-lexical errors it has been claimed that errors occur primarily at the level of the phoneme or feature (Wickelgren, 1965) and that erroneous utterances are phonetically and phonotactically grammatical (Wells, 1951; Fromkin, 1971). In other words, it is claimed that speech errors occur by misordering abstract phonological units and the result is a phonetically normal segment and possible word according to the grammar of the language. Phonetic errors are often explicitly argued against (e.g., Fromkin, 1971) and it is claimed that when abstract units move to different locations, they phonetically accommodate to their new environment. It should be noted that there is some disagreement on these conclusions among experimenters using the same collection techniques. For example, Stemberger (1983), based on his own corpus of naturally occurring errors, claimed that phonologically ungrammatical utterances do occur, though infrequently.

The use of transcription to encode speech errors has received widespread criticism and been the subject of some empirical research (Laver, 1980; Garnham et al., 1982;

Shattuck-Hufnagel, 1983; Mowrey & MacKay, 1990; Boucher, 1994; Ferber, 1995).

There are two primary criticisms. First, the use of phonetic transcription cannot capture sub-contrastive or gradient errors, below the level of a segment or feature, since the transcription system is inherently segmental. If gradient errors do occur, careful transcription of repeatedly heard recordings of speech errors would probably discover some of them. However, speech errors heard in conversation are usually only broadly transcribed and the listener’s full attention is not on phonetic detail.

1975) may only represent a portion of the actual speech errors produced in natural dialogue, and any model based on transcription evidence is therefore unable to answer questions about the phonetic details of speech errors. Errors collected in speech error inducing experiments (e.g., Baars et al., 1975; Dell, 1986; ShattuckHufnagel, 1992) might be more revealing of phonetic detail, since the errors are recorded and can be reviewed many times over. However, the design of these experiments is usually to produce a specific error. Thus, the experimenter’s transcription task is a forced-choice decisionFis it an error or notFrather than an unconstrained phonetic transcription task.

The second criticism of error collection using transcription is that the transcript is subject to the perceptual biases of the listener. It is well known from the literature on speech perception that speech is perceived in the context of the language system of the listener (see Wright, Frisch & Pisoni, 1999, for a recent review). For example, in the phenomenon of categorical perception, phonetically anomalous speech sounds that are acoustically intermediate between two categories are perceived by naive listeners as members of one category or the other, rather than a blend (see Liberman, 1997, for several articles). In another phenomenon, known as phonemic restoration, speech samples that have had segments replaced by noise or a cough are perceived as intact. Listeners, even when informed that there is a missing segment, are unable to accurately report which segment is missing or where in the word the disruption occurred (Warren, 1970; Samuel, 1981). Research on the detectability of mispronunciations of segments in running speech has found that the likelihood of detecting an error depends on the error’s place within the word and sentence, and the predictability of the word in its sentential context (Cole, 1973; Marslen-Wilson & Welsh, 1978). In summary, speech error percepts are subject to systematic biases and it is unclear whether many of the patterns observed in transcriptional speech error analyses are informative of linguistic biases in the speech production process or merely a reflection of the hearer’s perceptual system.

1.3. Instrumental data Mowrey & MacKay (1990) present an electromyographic (EMG) study of sublexical speech errors elicited using tongue twisters. They used EMG recordings of the orbicularis oris muscle (lower lip) and lingual transversus/verticalis muscle (tongue blade) in combination with audio recordings to determine whether noncontrastive errors occur. They acted as their own subjects. The tongue twisters they used, shown in (2), crucially involved segments with lower lip (/P/ in 2a) or tongue blade (/l/ in 2b, c) articulations that were in proximity to segments that did not contain such articulations (/s/ and /r/ or nothing, respectively).

(2) a. She sells sea shells by the seashore b. Bob flew by Bligh bay c. Fresh fried flesh of fowl Mowrey and MacKay interpreted unexpected muscular activity as evidence for an error in speech production, and so in their study the definition of an error is crucially different from a perceptual definition based on transcription. They found a The phonetics of phonological speech errors number of instances of inappropriate muscular activity where there was no perceptible anomaly. In addition, they found several cases of inappropriate muscular activity where there was a potentially intermediate percept that was not a clear member of either category. They conclude based on this evidence that gradient errors do occur, and they further claim that such errors are quite frequent. In one set of 150 recordings of Bob flew by Bligh bay they report 48 tokens containing errors involving intrusive transversus/verticalis muscle activation, the majority of which involved an amount of activation intermediate between none and that which is appropriate for a normal [l] production. Their proposal, from an articulatory perspective, is that sub-lexical errors occur on a continuum of gestural activity, and are neither segmental nor grammatical under any reasonable definition of these terms. Preliminary data from a recent study of one speaker using magnetometry supports the claim that gradient activation results in partial or incomplete gestures (Pouplier, Chen, Goldstein & Byrd, 1999).

These articulatory studies of gestural activity during tongue twister production clearly reveal that there can be gradient levels of gestural activation that have imperceptible consequences. Mowrey and MacKay conclude that such evidence demonstrates that neither the speaker nor the listener may be aware of the true nature of the errors made or whether indeed an error has been made at all y [which] contradicts any model claiming that ‘‘low-level’’ phonetic processes are necessarily overseen by a higher-order segmental or featural planning unit (p. 1311).

