Representation vs. explanation?

I have often wondered whether detailed representational formalism is somehow in conflict with genuine explanation in linguistics. I have been tangentially involved in the cottage industry that is applying the Tolerance Principle (Yang 2005, 2016) to linguistic phenomena, most notably morphological defectivity. In our paper on the subject (Gorman & Yang 2019), we are admittedly somewhat nonchalant about the representations in question, a nonchalance which is, frankly, characteristic of this microgenre.

In my opinion, however, our treatment of Polish defectivity is representationally elegant. (See here for a summary of the data.) In this language, fused case/number suffixes show suppletion based on the gender—in the masculine, animacy—of the stem, and there is lexically conditioned suppletion between -a and -u, the two allomorphs of the gen.sg. for masculine inanimate nouns. To derive defectivity, all we need to show is that Tolerance predicts that, in the masculine inanimate, there is no default suffix to realize the gen.sg. If there are two realization rules in competition, we can implement this by making both of them lexically conditioned, and leaving nouns which are defective in the gen.sg. off both lexical “lists”. We can even imagine, in theories with late insertion, that the grammatical crash is the result of uninterpretable gen.sg. features which are, in defective nouns, still present at LF.¹

It is useful to contrast this with our less-elegatn treatment of Spanish defectivity in the same paper. (See here for a summary of the data.) There we assume that there is some kind of grammatical competition for verbal stems between the rules that might be summarized as “diphthongize a stem vowel when stresssed” and “do not change”. We group the two types of diphthongization (o to ue [we] and e to ie [je]) as a single change, even though it is not trivial to make these into a single change.² This much at least has a venerable precedent, but what does it mean to treat diphthongization as a rule in the first place? The same tradition tends to treat the propensity to diphthongize as a phonological (i.e., perhaps via underspecification or prespecification, à la Harris 1985) or morphophonological property of the stem (a lexical diacritic à la Harris 1969, or competition between pseudo-suppletive stems à la Bermúdez-Otero 2013), and the phonological contents of a stem is presumably stored in the lexicon, and not generated by any sort of rule.³ Rather, our Tolerance analysis seems to imply we have thrown in our lots with Albright and colleagues (Albright et al. 2001, Albright 2003) and Bybee & Pardo (1981), who analyze diphthongization as a purely phonological rule depending solely on the surface shape of the stem. This is despite the fact that we are bitterly critical of these authors for other reasons⁴ and I would have preferred—aesthetically at least—to adopt an analysis where diphthongization is a latent property of particular stems.

At this point, I could say, perhaps, that the data—combined with our theoretical conception of the stem inventory portion of the lexicon as a non-generative system—is trying to tell me something about Spanish diphthongization, namely that Albright, Bybee, and colleagues are onto something, representationally speaking. But, compared with our analysis of Polish, it is not clear how these surface-oriented theories of diphthongization might generate grammatical crash. Abstracting from the details, Albright (2003) imagines that there are a series of competing rules for diphthongization, whose “strength” derives from the number of exemplars they cover. In his theory, the “best” rule can fail to apply if its strength is too low, but he does not propose any particular threshold and as we show in our paper, his notion of strength is poorly correlated with the actual gaps. Is it possible our analysis is onto something if Albright, Bybee, and colleagues are wrong about the representational basis for Spanish diphthongization?

Endnotes

This case may still be a problem for Optimality Theory-style approaches to morphology, since Gen must produce some surface form.
I don’t have the citation in front of me right now, but I believe J. Harris originally proposed that the two forms of diphthongization can be united insofar as both of them can be modeled as insertion of e triggering glide formation of the preceding mid vowel.
For the same reason, I don’t understand what morpheme structure constraints are supposed to do exactly. Imagine, fancifully, that you had a mini-stroke and the lesion it caused damaged your grammar’s morpheme structure rule #3. How would anyone know? Presumably, you don’t have any lexical entries which violate MSC #3, and adults generally does not make up new lexical entries for the heck of it.
These have to do with what we perceive as the poor quality of their experimental evidence, to be fair, not their analyses.

References

Albright, A., Andrade, A., and Hayes, B. 2001. Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics 7: 117-151.
Albright, A. 2003. A quantitative study of Spanish paradigm gaps. In Proceedings of the 22th West Coast Conference on Formal Linguistics, pages 1-14.
Bermúdez-Otero, R. The Spanish lexicon stores stems with theme vowels, not roots with inflectional class features. Probus 25: 3-103.
Bybee, J. L. and Pardo, E. 1981. On lexical and morphological conditioning of alternations: a nonce-prob e experiment with Spanish verbs. Linguistics 19: 937-968.
Gorman,. K. and Yang, C. 2019. When nobody wins. In F. Rainer, F. Gardani, H. C. Luschützky and W. U. Dressler (ed.), Competition in Inflection and Word Formation, pages 169-193. Springer.
Harris, J. W. 1969. Spanish Phonology. MIT Press.
Harris, J. W. 1985. Spanish diphthongisation and stress: a paradox resolved. Phonology 2: 31-45.

Another quote from Ludlow

Indeed, when we look at other sciences, in nearly every case, the best theory is arguably not the one that reduces the number of components from four to three, but rather the theory that allows for the simplest calculations and greatest ease of use. This flies in the face of the standard stories we are told about the history of science. […] This way of viewing simplicity requires a shift in our thinking. It requires that we see simplicity criteria as having not so much to do with the natural properties of the world, as they have to do with the limits of us as investigators, and with the kinds of theories that simplify the arduous task of scientific theorizing for us. This is not to say that we cannot be scientific realists; we may very well suppose that our scientific theories approximate the actual structure of reality. It is to say, however, that barring some argument that “reality” is simple, or eschews machinery, etc., we cannot suppose that there is a genuine notion of simplicity apart from the notion of “simple for us to use.” […] Even if, for metaphysical reasons, we supposed that reality must be fundamentally simple, every science (with the possible exception of physics) is so far from closing the book on its domain it would be silly to think that simplicity (in the absolute sense) must govern our theories on the way to completion. Whitehead (1955, 163) underlined just such a point.

Nature appears as a complex system whose factors are dimly discerned by us. But, as I ask you, Is not this the very truth? Should we not distrust the jaunty assurance with which every age prides itself that it at last has hit upon the ultimate concepts in which all that happens can be formulated. The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, Seek simplicity and distrust it.

(Ludlow 2011:158-160)

References

Ludlow, P. 2011. The Philosophy of Generative Grammar. Oxford University Press.
Whitehead, W. N. 1955. The Concept of Nature. Cambridge University Press.

Entrenched facts

Berko’s (1958) wug-test is a standard part of the phonologist’s toolkit. If you’re not sure if a pattern is productive, why not ask whether speakers extend it to nonce words? It makes sense; it has good face validity. However, I increasingly see linguists who think that the results of wug-tests actually trumps contradictory evidence coming from traditional phonological analysis applied to real words. I respectfully disagree.

Consider for example a proposal by Sanders (2003, 2006). He demonstrates that an alternation in Polish (somewhat imprecisely called o-raising) is not applied to nonce words. From this he takes o-raising to be handled via stem suppletion. He asks, and answers, the very question you may have on your mind. (Note that his H here is the OT constraint hierarchy; you may want to read it as grammar.)

Is phonology obsolete?! No! We still need a phonological H to explain how nonce forms conform to phonotactics. We still need a phonological H to explain sound change. And we may still need H to do more with morphology than simply allow extant (memorized) morphemes to trump nonce forms. (Sanders 2006:10)¹

I read a sort of nihilism into this quotation. However, I submit that the fact that 50 million people just speak Polish—and “raise” and “lower” their ó‘s with a high degree of consistency across contexts, lexemes, and so on—is a more entrenched fact than the results of a small nonce word elicitation task. I am not saying that Sander’s results are wrong, or even misleading, just that his theory has escalated the importance of these results to the point where it has almost nothing to say about the very interesting fact that the genitive singular of lód [lut] ‘ice’ is lodu [lɔdu] and not *[ludu], and that tens of millions of people agree.

Endnotes

Sanders’ 2006 manuscript is a handout but apparently it’s a summary of his 2003 dissertation (Sanders 2003), stripped of some phonetic-interface details not germane to the question at hand. I just mention so that it doesn’t look like I’m picking on a rando. Those familiar with my work will probably guess that I disagree with just about everything in this quotation, but kudos to Sanders for saying something interesting enought to disagree with.

References

Berko, J. 1958. The child’s learning of English morphology. Word 14: 150-177.
Sanders, N. 2003. Opacity and sound change in the Polish lexicon. Doctoral dissertation, University of California, Santa Cruz.
Sanders, N. 2006. Strong lexicon optimization. Ms., Williams College and University of Massachusetts, Amherst.

Why binarity is probably right

Consider the following passage, about phonological features:

I have not seen any convicing justification for the doctrine that all features must be underlyingly binary rather than ternary, quaternary, etc. The proponents of the doctrine often realize it needs defending, but the calibre of the defense is not unfairly represented by the subordinary clause devoted to the subject in SPE (297): ‘for the natural way of indicating whether or not an item belongs to a particular category is by means of binary features.’ The restriction to two underlying specifications creates problems and solves none. (Sommerstein 1977: 109)

Similarly, I had a recent conversation by someone who insisted certain English multi-object constructions in syntax are better handled by assuming the possibility of ternary branching.

I disagree with Sommerstein, though: a logical defense of the assumption of binarity—both for the specification of phonological feature polarity and for the arity of syntactic trees—is so obvious that it fits on a single page. Roughly: 1) less than two is not enough, and; 2) two is enough.

Less than two is not enough. This much should be obvious: theories in which features only have one value, or syntactic constituents cannot dominate more than one element, have no expressive power whatsover.^1,2

Two is enough. Every time we might desire to use a ternary feature polarity, or a ternary branching non-terminal, there exists a weakly equivalent specification which uses binary polarity or binary branching, respectively, and more features or non-terminals. It is then up to the analyst to determine whether or not they are happy with the natural classes and/or constituents obtained, but this possibility is always available to the analyst. One opposed to the this strategy has a duty to say why the hypothesized features or non-terminals are wrong.

Endnotes

It is important to note in this regard that privative approaches to feature theory (as developed by Trubetzkoy and disciples) are themselves special cases of the binary hypothesis which happen to treat absence as a non-referable. For instance, if we treat the set of nasals as a natural class (specified [Nasal]) but deny the existence of the (admittedly rather diverse) natural class [−Nasal]—and if we further insist rules be defined in terms of natural classes, and deny the possibility of disjunctive specification—we are still working in a binary setting, we just have added an additional stipulation that negated features cannot be referred to by rules.
I put aside the issue of cumulativity of stress—a common critique in the early days—since nobody believes this is done by feature in 2023.

References

Sommerstein, A. 1977. Modern Phonology. Edward Arnold.

Use the minus sign for feature specifications

LaTeX has a dizzying number of options for different types of horizontal dash. The following are available:

A single - is a short dash appropriate for hyphenated compounds (like encoder-decoder).
A single dash in math mode, $-$ , is a longer minus sign
A double -- is a longer “en-dash” appropriate for numerical ranges (like 3-5).
A triple --- is a long “em-dash” appropriate for interjections (like this—no, I mean like that).

My plea to linguists is to actually use math mode and the minus sign when they are writing binary features. If you want to turn this into a simple macro, you can please the following in your preamble:

\newcommand{feature}[2]{\ensuremath{#1}\textsc{#2}}

and then write \feature{-}{Back} for nicely formatted feature specifications.

Note that this issue has an exact parallel in Word and other WYSIWYG setups: there the issue is as simple as selecting the Unicode minus sign (U+2212) from the inventory of special characters (or just googling “Unicode minus sign” and copying and pasting what you find).

A note on pure allophony

I have previously discussed the notion of pure allophony, contrasting it with the facts of alternations. What follows is a lightly edited section from my recent NAPhC 12 talk, which in part hinges on this notion.

While Halle (1959) famously dispenses with the structuralist distinction between phonemics and morphophonemics, some later generativists reject pure allophony outright. Let the phonemic inventory of some grammar G be P and the set of surface phones generated by G from P be S. If some phoneme p ∈ P always corresponds—in some to be made precise—to some phone s ∈ S and if s ∉ P then s is a pure allophone of p. For example, if /s/ is a phoneme and [ʃ] is not, but all [ʃ]s correspond to /s/s, then [ʃ] is a pure allophone of [s]. According to some descriptions, this is the case for Korean, as [ʃ] is a (pure) allophone of /s/ when followed by [i].

One might argue that alternations are more entrenched facts than pure allophony, simply because it is always possible to construct a grammar free of pure allophony. For instance, if one wants to do away with pure allophony one can derive the Korean word [ʃI] ‘poem’ from /ʃi/ rather than from /si/. One early attempt to rule out pure allophony—and thus to motivate the choice of /ʃi/ over /si/ for the this problem—is the alternation condition (Kiparsky 1968). As Kenstowicz & Kisseberth (1979:215) state it, this condition holds that “the UR of a morpheme may not contain a phoneme /x/ that is always realized phonetically as identical to the realization of some other phoneme /y/.” [Note here that /x, y/ are to be interpreted as variables rather than as the voiceless velar fricative or the front high round vowel.–KBG] Another recent version of this idea—often attributed to Dell (1973) or Stampe (1973)—is the notion of lexicon optimization (Prince & Smolensky 1993:192).

A correspondent to this list wonders why, in a grammar G such that G(a) = G(b) for potential input elements /a, b/, a nonalternating observed element [a] is not (sometimes, always, freely) lexically /b/. The correct answer is surely “why bother?”—i.e. to set up /b/ for [a] when /a/ will do […] The basic idea reappears as “lexicon optimization” in recent discussions. (Alan Prince, electronic discussion; cited in Hale & Reiss 2008:246)

Should grammars with pure allophony be permitted? The question is not, as is sometimes supposed, a purely philosophical one (see Hale & Reiss 2008:16-22): both linguists and infants acquiring language require a satisfactory answer. In my opinion, the burden of proof lies with those who would deny pure allophony. They must explain how the language acquisition device (LAD) either directly induces grammars that satisfy the alternation condition, or optimizes all pure allophony out of them after the fact. “Why bother” could go either way: why posit either complication to the LAD when pure allophony will do? The linguist faces a similar problem to the infant. To wit, I began this project assuming Latin glide formation was purely allophonic, and only later uncovered—subtle and rare—evidence for vowel-glide alternations. Thus in this study, I make no apology for—and draw no further attention to—the fact that some data are purely allophonic. This important question will have to be settled by other means.

References

Dell, F. 1973. Les règles et les sons. Hermann.
Hale, M, and Reiss, R.. 2008. The Phonological Enterprise. Oxford University Press.
Halle, M. 1959. The Sound Pattern of Russian. Mouton.
Kenstowicz, M. and Kisseberth, C. 1979. Generative Phonology: Description and Theory. Academic Press.
Kiparsky. P. 1968. How Abstract is Phonology? Indiana University Linguistics Club.
Prince, A. and Smolensky, P. 1993. Optimality Theory: Constraint interaction in generative grammar. Technical Report TR-2, Rutgers University Center For Cognitive Science and Technical Report CU-CS-533-91, University of Colorado, Boulder Department of Computer Science.
Stampe, D. 1973. A Dissertation on Natural Phonology. Garland.

Defectivity in Amharic

[This is part of a series of defectivity case studies.]

According to Sande (2015), only Amharic verb stems that contain a geminate can form a frequentative. Since not all imperfect aspect verbs have geminates, some lack frequentatives and speakers must resort to periphrasis. If I understand the data correctly, it appears that the frequentative is a /Ca-/ reduplicant template which docks to the immediate left of the first geminate; the C (consonant) slot takes its value from said geminate. For instance, for the perfect verb [ˈsäb.bärä] ‘he broke’, the frequentative is [sä.ˈbab.bärä] ‘he broke repeatedly’. But there is no corresponding frequentative for the imperfective verb [ˈjə.säb(ə)r] ‘he breaks’ since there is no geminate to dock the reduplicant against; Sande marks as ungrammatical *[jə.sä.ˈbab(ə)r] and presumably other options are out too.

(h/t: Heather Newell)

References

Sande, H. 2015. Amharic infixing reduplication: support for a stratal approach to morphophonology. Talk presented at NELS 46.

Do surface representations contain phonemes?

An interesting philosophical question: if a phoneme present in underlying representation surfaces faithfully, does the surface representation “contain” that phoneme, or is it better to say faithful phonemes have been vacuously transduced to an (allo)phone with the same specification?

More than one rule

[Leaving this as a note to myself to circle back.]

I’m just going to say it: some “rules” are probably two or three rules, because the idea that rules are defined by natural classes (and thus free of disjunctions) is more entrenched than our intuitions about whether or not a process in some language is really one rule or not, and we should be Gallilean about this. Here are some phonological “rules” that are probably two or three rules different rules.

Indo-Iranian, Balto-Slavic families, and Albanian “ruki” (environment: preceding {w, j, k, r}): it is not clear to me if any of these languages actually need this as a synchronic rule at all.
Breton voiced stop lenition (change: /b/ to [v], /d/ to [z], /g/ to [x]): the devoicing of /g/ must be a separate rule. Hat tip: Richard Sproat. I believe there’s a parallel set of processes in German.
Lamba patalatalization (change: /k/ to [tʃ], /s/ to [ʃ]): two rules, possibly with a Duke-of-York thing. Hat tip: Charles Reiss.
Mid-Atlantic (e.g., Philadelphia) English ae-tensing (environment: following tautosyllabic, same-stem {m, n, f, θ, s, ʃ]): let’s assume this is allophony; then the anterior nasal and voiceless fricative cases should be separate rules. It is possible the incipient restructuring of this as having a simple [+nasal] context provides evidence for the multi-rule analysis.
Latin glide formation (environment: complex). Front and back glides are formed from high short monophthongs in different but partially overlapping contexts.

Feature maximization and phonotactics

[This is a quick writing exercise for in-progress work with Charles Reiss. Sorry if it doesn’t make sense out of context.]

An anonymous reviewer asks:

I wonder how the author(s) would reconcile this learning model with the evidence that both children and adults seem to aggressively generalize phonotactic restrictions from limited data (e.g. just [p]) to larger, unobserved natural classes (e.g. [p f b v]). See e.g. the discussion in Linzen & Gallagher (2017). If those results are credible, they seem much more consistent with learning minimal feature specifications for natural classes than learning maximal ones.

First, note that Linzen & Gallagher’s study is a study of phonotactic learning, whereas our proposal concerns induction of phonological rules. We have been, independently but complementarily, quite critical of the naïve assumptions inherent in prior work on this topic (e.g., Gorman 2013, ch. 2; Reiss 2017, §6); we have both argued that knowledge of phonotactic generalizations may require much less grammatical knowledge than is generally believed.

Secondly, we note that Linzen & Gallagher’s subjects are (presumably; they were recruited on Mechanical Turk and were paid $0.65 USD for their efforts) adults briefly exposed to an artificial language. While we recognize that adult “artificial language learning” studies are common practice in psycholinguistics, it is not clear what such studies contribute to our understanding of phonotactic acqusition (whatever the phonotactic acquirenda turn out to be) by children robustly exposed to realistic languages in situ.

Third, the reviewer is incorrect; the result reported by Linzen & Gallagher (henceforth L&G) is not consistent with minimal generalization. Let us grant—for sake of argument—that our proposal about rule induction in children is relevant to their work on rapid phonotactic learning in adults. One hypothesis they entertain is that their participants will construct “minimal classes”:

For example, when acquiring the phonotactics of English, learners may first learn that both [b] and [g] are valid onsets for English syllables before they can generalize to other voiced stops (e.g., [d]). This generalization will be restricted to the minimal class that contained the attested onsets (i.e., voiced stops), at least until a voiceless stop onset is encountered.

If by a “minimal class” L&G are referring to a natural class which is consistent with the data and has an extension with the fewest members, then presumably they would endorse our proposal of feature maximization, since the class that satisfies this definition is the most fully specified empirically adequate class. However, it is an open question whether or not such a class would actually contain [d]. For instance, if one assumes that major place features are bivalent, then the intersection of the features associated with [b, g] will contain the specification [−coronal], which rules out [d].

Interestingly, the matter is similarly unclear if we interpret “minimal class” intensionally, in terms of the number of features, rather than in terms of the number of phonemes the class picks out. The (featurewise-)minimal specification for a single phone (as in the reviewer’s example) is the empty set, which would (it is generally assumed) pick out any segment. Then, we would expect that any generalization which held of [p], as in the reviewer’s example, to generalize not just to other labial obstruents (as the reviewer suggests), but to any segment at all. Minimal feature specification cannot yield a generalization from [p] to any proper subset of segments, contra the anonymous reviewer and L&G. An adequate minimal specification which picks out [p] will pick out just [p].; L&G suggest that maximum entropy models of phonotactic knowledge may have this property, but do not provide a demonstration of this for any particular implementation of these models.

We thank the anonymous reviewer for drawing our attention to this study and the opportunity their comment has given us to clarify the scope of our proposal and to draw attention to a defect in L&G’s argumentation.

References

Gorman, K. 2013. Generative phonotactics. Doctoral dissertation, University of Pennsylvania.
Linzen, T., and Gallagher, G. 2017. Rapid generalization in phonotactic learning. Laboratory Phonology: Journal of the Association for Laboratory Phonology 8(1): 1-32.
Reiss, C. 2017. Substance free phonology. In S.J. Hannahs and A. Bosch (ed.), The Routledge Handbook of Phonological Theory, pages 425-452. Routledge.