general mathematical variables x, y, found in equations such as y=mx+b, such a superficial resemblance is not sufficient justification for claiming that this same concept is being used in phonological theory.

The concept of “feature variable” must be linguistically motivated and defined. While assimilatory processes of the general (SPE) form [+X] → [αFi,βFj,γFk] / __ [αFi,βFj,γFk] demonstrate the need for some such concept, they do not automatically justify the choice of the particular formal mechanism. Explicit comparison of alternatives is required. The most important and difficult first step is scrutinizing the structure of the claims, stating the theories explicitly, using well-justified theoretical concepts.

A feature-value variable in SPE is a random symbol (drawn from an unbounded vocabulary) which refers to a disjunction of values that features may have, viz. {+,-}. The important formal claims entailed

by SPE variable notation are that:

(2) Rules refer to feature values via the values that exist in representations, + and -, or via a variable.

The vocabulary of distinct variables is unbounded.

The theory says nothing about the relationship between the feature and the variable associated with the feature. The attributes and values which are “features” are only accidentally related, and the particular pairings of value and attribute in rules are a distinctive property of individual rules, thus [...αcont, βvoice...]... [...αcont, βvoice...] does not say the same thing as [...αcont, αvoice...]... [...βcont, βvoice...].

Since a particular variable can be assigned to any token of any feature and there is no bound on the number of feature-tokens in a rule, an unbounded set of variables is necessary. This theory will be referred to as Value-Variable theory.

An alternative theory, to be referred to as “Identical-Value theory”, is that the relevant phonological concept is “the value of feature X”, which presumes a tight bond between value and attribute. Ideas along these lines are found in the work of McCawley and Reiss. The first clause of (2) is also assumed as a statement of the form of rules in this theory. The value of a variable is automatically computed from the fact that it is specified on a given feature, and the comparison is between all variably-specified instances of that feature. Since values are not independent of attributes in this theory, the formal vocabulary only requires a single symbol, written here as “=”. A rule containing the condition [=Fm]i...[=Fm]j matches a string...Si...Sj... if and only if Si and Sj are both [+Fm] or both [-Fm]. Thus the requirement that a pair of triggering segments have the same place features is expressed in Identical-Value theory as “/...[=ant,=cor,=back] [=ant,=cor,=back]”, meaning “the value of [anterior] in segment 1 is the same as the value of [anterior] in segment 2, and the value of [coronal] in segment 1 is the same as the value of [coronal] in segment 2...”.

These theories can be compared in terms of conceptual simplicity. Identical-Value theory has a single variable and the “variable” is not an autonomous thing, it is an additional kind of specification relationship “is the same”, to be added to “is plus” and “is minus”. Value-Variable theory has an unbounded collection of variables which must be treated as things separate from feature attributes. Ceteris paribus, a theory with a single added vocabulary item is to be selected over a theory with a larger (especially unbounded) added vocabulary. An empirical argument for Identical-Value theory derives from the fact that Value-Variable theory makes a broader – and unjustified – claim which Identical-Value theory does not make. Strong empirical evidence for feature variables is limited to a well-defined class of references, of the following general form (SPE notation), where each variable is associated with a single feature.

The SPE notation differs syntactically from general numeric variables. A bare variable is meaningful in a numeric equation but not in a phonological rule. Variables can be multiplied and added in a numeric equation, but [αβFi] is undefined in the SPE theory of notation.

I take for granted an interpretation of the notations, because a detailed development of variable interpretation presupposes a theory of string-to-rule matching and then says what is special about variables. Since it does not appear that one theory entails a substantially more complex interpretation algorithm, such discussion is orthogonal to the purpose of comparing the complexity and justification of two theories.

Other uses of feature variables are discussed in 6.3.


(3) X → [αF1,βF2,γF3] / ___ [αF1,βF2,γF3] Value-Variable theory makes an additional claim, that rules may also include a requirement that instances of different features have the same value, for example that the roundness of one segment must be the same as the voicedness of another. A rule of the form (4) [αround,βback] → [γhi,χtense] / [δround,ɛback,γhi] ___ [αvoice,δnas,βson] [ɛcont,χant] is well-formed in Value-Variable theory. Every claim made by Identical-Value theory is also made by Value-Variable theory, and the converse is not true. We have now identified the difference between the concepts making up two theories of variables. Which sets of concepts best correspond to reality?

If the additional claim of Value-Variable theory were factually justified, the concept embodied in Value-Variable theory would be superior to that of Identical-Value theory – Value-Variable theory would be necessitated. There being no evidence for detachment of values from attributes, Value-Variable theory must be rejected in the face of the alternative theory, which is conceptually simpler, and which also does not make this additional unjustified claim. Any argument for Value-Variable theory would therefore have to focus on the empirical differences between the theories – showing for example that grammars do in fact impose conditions on rules such as “takes the same value of nasal as the trigger has for round”.

This discussion reveals the proper role for concerns about overgeneration. The right concern is not whether one concept interacts with other concepts to yield unobserved languages (“intervocalic devoicing”). Indeed, the ability of concepts to interact so as to describe things that have not yet been observed is a positive attribute of science – it is the power to predict. The proper concern is whether the correct concept was identified in the first place: or, was an unjustified claim made. Worry over overgeneration is never valid in isolation. Applied to competing concepts, proper concern with overgeneration is about going beyond necessity in positing concepts. The theoretical concept “feature variable” is not necessary, in the face of the alternative “identical value”.

Unfortunately, Occam’s Razor, which is wielded frequently in linguistic argumentation, is often construed the wrong way. Often, Occam’s Razor is interpreted to refer to the extension of a science, that is, to say that the logically preferred theory is the one that claims that there are fewer entities in the world.

According to that logic, a representational theory allowing 3,159 distinct segments to be described is held to be superior to a theory allowing 3,160 segments (as long as there aren’t more that 3,159 known segments). But Occam’s Razor is not a metaphysical claim about the nature of the universe, that there are few entities, it is a normative statement about the proper form of theories of the universe. A theory is a system of concepts, not a collection of things-in-the-universe, and Occam’s Razor is a statement about systems of concepts. The wording of Aristotle (Posterior Analytics) reveals the original intent behind Occam’s Razor: “We may assume the superiority ceteris paribus of the demonstration which derives from fewer postulates or hypotheses” (emphasis added), that is, the fewer theoretical concepts, the better.

Likewise, Aquinas holds that “If a thing can be done adequately by means of one, it is superfluous to do it by means of several; for we observe that nature does not employ two instruments where one suffices”, restated by Occam as “It is futile to do with more things that which can be done with fewer”, again emphasizing the centrality of the “instrument” – theoretical concepts – and not the things that theoretical Phonological epistemology has not progressed to the stage that numerical measurement of simplicity can be undertaken. In numerically-quantifiable physical sciences, formal evaluation of the simplicity of a theory is more meaningful, since the applicable concepts have been made so explicit that they can be represented as a single symbol in an equation.

One of the points of this essay is that phonological epistemology must progress so that we can better identify the individual logical claims embodied in a metatheoretical conclusion about grammars.

A further question that should be raised is whether a theory is overall consistent with what is known about human cognition. See section 5 for discussion.

The principle is simply named after a prominent Aristotelian scholatic philosopher, William of Ockham, who distilled a millenium of thinking on the topic.


concepts are about. The Newtonian statement (Principia Mathematica) mentioned in fn. 12 – “We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances” – focuses on admiting causes (explanatory concepts), not effects (entities in the world). There is no justification for elevating “economy of existents” to the status of a methodological principle. The tack taken in Hayes (1985, 1987) of complicating the concepts relevant to foot construction in order to preclude (presumed) non-existent languages is thus contrary to the methodology of Formal Phonology.

Put simply, when evaluating two grammatical principles of (apparently) equivalent conceptual complexity, the logically preferred theory is the one which most closely describes what is known to be true, because it does not make an unjustified claim.

5. Relevant evidence Evaluating a theoretical concept is conceptually simple, since it amounts to determining whether the concept describes the relevant facts, and in comparison to alternative concepts gives the simplest description of “what is”. One aspect of theory-evaluation is simple: if a concept identifies known grammatical facts and its competitor denies the facts, the denying competitor is wrong. Any theory of grammatical structure which denies the linguistic fact “counter-feeding” is simply wrong (though whether or not the specific mechanism of rule ordering is required to describe that fact depends on what non-ordered alternatives there are). A theory of phonological grammars must be held accountable for what grammars do: they map inputs to outputs, thereby generating the strings that are the language, in the extensional sense.

In positing an argument for one theory over another, the relevance of supposed evidence must be evaluated. A common mode of argumentation in contemporary phonology involves looking outside of grammatical competence to find “confirming” evidence regarding grammatical competence, which often involves performing a behavioral test with speakers of a language, and conjecturing that the results of such a test provide evidence for a specific grammatical theory. Since FP is part of Generative Grammar, it is a mentalist theory which makes claims about how the mind operates – FP computations are claimed to be mentally real and intensional, not Platonic extensional abstractions unbound by the nature of human cognition. As a reputedly real aspect of the mind, it is not unimaginable that psychological tests could bear on the task of finding the correct theory of grammars. We must therefore review the nature of the phonological enterprise to determine what kinds of evidence would be valid for judging theories.

The phonological component maps from input to output (not speech), so facts about those representations and mappings are relevant. The object of study is not “the ability to use sound symbolically” or “the cognitive capacity of humans”, therefore facts from those domains do not gain automatic validity for answering questions about grammars. The question that has to be asked about potential evidence is whether it does indeed answer questions about phonological representations or input-output mappings; or does it answer a question about some other faculty, which interacts with phonology only indirectly?

An experiment which determines that oral air-pressure rises more rapidly during production of a voiced velar stop than it does during production of a voiced bilabial stop is not relevant to understanding the formal nature of phonological representations or mappings. No phonological rule implies anything about the rate of air-pressure buildup, and no formal theory of representations implies anything about the rate of air-pressure buildup in segments, thus such an experiment does not provide relevant evidence regarding the theory of grammar, not even “supporting” evidence. The experiment might provide an indirect rationale for a place-asymmetrical stop devoicing rule in some language, where /g/ devoices and /b/ does not, but that rationale is outside the theory of phonological computation and representation. What mediates between air-pressure rise and extant grammatical rules is historical change – see Hale (2007) for extensive discussion. The rules in a language’s grammar are induced on the basis of surface mental representations, which themselves are based on physical sound. The nature of the physical sound that results from the output of the grammar in producing a form such as /abaga/ is thus sensitive to whatever influences the rate of air-pressure rise, and may produce a body output of the type abaka, which may be transduced by the language learner as [abaga] or [abaka], depending on whether or not the physically



predicted effect on vocal fold vibration is phonologized as categorial devoicing. Since grammar does not refer to rate of air-pressure rise, experiments about air-pressure rise tell us nothing about grammar.

