Another quote from Ludlow

Indeed, when we look at other sciences, in nearly every case, the best theory is arguably not the one that reduces the number of components from four to three, but rather the theory that allows for the simplest calculations and greatest ease of use. This flies in the face of the standard stories we are told about the history of science. […] This way of viewing simplicity requires a shift in our thinking. It requires that we see simplicity criteria as having not so much to do with the natural properties of the world, as they have to do with the limits of us as investigators, and with the kinds of theories that simplify the arduous task of scientific theorizing for us. This is not to say that we cannot be scientific realists; we may very well suppose that our scientific theories approximate the actual structure of reality. It is to say, however, that barring some argument that “reality” is simple, or eschews machinery, etc., we cannot suppose that there is a genuine notion of simplicity apart from the notion of “simple for us to use.” […] Even if, for metaphysical reasons, we supposed that reality must be fundamentally simple, every science (with the possible exception of physics) is so far from closing the book on its domain it would be silly to think that simplicity (in the absolute sense) must govern our theories on the way to completion. Whitehead (1955, 163) underlined just such a point.

Nature appears as a complex system whose factors are dimly discerned by us. But, as I ask you, Is not this the very truth? Should we not distrust the jaunty assurance with which every age prides itself that it at last has hit upon the ultimate concepts in which all that happens can be formulated. The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, Seek simplicity and distrust it.

(Ludlow 2011:158-160)

References

Ludlow, P. 2011. The Philosophy of Generative Grammar. Oxford University Press.
Whitehead, W. N. 1955. The Concept of Nature. Cambridge University Press.

Entrenched facts

Berko’s (1958) wug-test is a standard part of the phonologist’s  toolkit. If you’re not sure if a pattern is productive, why not ask whether speakers extend it to nonce words? It makes sense; it has good face validity. However, I increasingly see linguists who think that the results of wug-tests actually trumps contradictory evidence coming from traditional phonological analysis applied to real words. I respectfully disagree. 

Consider for example a proposal by Sanders (2003, 2006). He demonstrates that an alternation in Polish (somewhat imprecisely called o-raising) is not applied to nonce words. From this he takes o-raising to be handled via stem suppletion. He asks, and answers, the very question you may have on your mind. (Note that his here is the OT constraint hierarchy; you may want to read it as grammar.)

Is phonology obsolete?! No! We still need a phonological H to explain how nonce forms conform to phonotactics. We still need a phonological H to explain sound change. And we may still need H to do more with morphology than simply allow extant (memorized) morphemes to trump nonce forms. (Sanders 2006:10)1

I read a sort of nihilism into this quotation. However, I submit that the fact that 50 million people just speak Polish—and “raise” and “lower” their ó‘s with a high degree of consistency across contexts, lexemes, and so on—is a more entrenched fact than the results of a small nonce word elicitation task. I am not saying that Sander’s results are wrong, or even misleading, just that his theory has escalated the importance of these results to the point where it has almost nothing to say about the very interesting fact that the genitive singular of lód [lut] ‘ice’ is lodu [lɔdu] and not *[ludu], and that tens of millions of people agree.

Endnotes

  1. Sanders’ 2006 manuscript is a handout but apparently it’s a summary of his 2003 dissertation (Sanders 2003), stripped of some phonetic-interface details not germane to the question at hand. I just mention so that it doesn’t look like I’m picking on a rando. Those familiar with my work will probably guess that I disagree with just about everything in this quotation, but kudos to Sanders for saying something interesting enought to disagree with.

References

Berko, J. 1958. The child’s learning of English morphology. Word 14: 150-177.
Sanders, N. 2003. Opacity and sound change in the Polish lexicon. Doctoral dissertation, University of California, Santa Cruz.
Sanders, N. 2006. Strong lexicon optimization. Ms., Williams College and University of Massachusetts, Amherst.

Why binarity is probably right

Consider the following passage, about phonological features:

I have not seen any convicing justification for the doctrine that all features must be underlyingly binary rather than ternary, quaternary, etc. The proponents of the doctrine often realize it needs defending, but the calibre of the defense is not unfairly represented by the subordinary clause devoted to the subject in SPE (297): ‘for the natural way of indicating whether or not an item belongs to a particular category is by means of binary features.’ The restriction to two underlying specifications creates problems and solves none. (Sommerstein 1977: 109)

Similarly, I had a recent conversation by someone who insisted certain English multi-object constructions in syntax are better handled by assuming the possibility of ternary branching.

I disagree with Sommerstein, though: a logical defense of the assumption of binarity—both for the specification of phonological feature polarity and for the arity of syntactic trees—is so obvious that it fits on a single page. Roughly: 1) less than two is not enough, and; 2) two is enough.

Less than two is not enough. This much should be obvious: theories in which features only have one value, or syntactic constituents cannot dominate more than one element, have no expressive power whatsover.1,2

Two is enough. Every time we might desire to use a ternary feature polarity, or a ternary branching non-terminal, there exists a weakly equivalent specification which uses binary polarity or binary branching, respectively, and more features or non-terminals. It is then up to the analyst to determine whether or not they are happy with the natural classes and/or constituents obtained, but this possibility is always available to the analyst. One opposed to the this strategy has a duty to say why the hypothesized features or non-terminals are wrong.

Endnotes

  1. It is important to note in this regard that privative approaches to feature theory (as developed by Trubetzkoy and disciples) are themselves special cases of the binary hypothesis which happen to treat absence as a non-referable. For instance, if we treat the set of nasals as a natural class (specified [Nasal]) but deny the existence of the (admittedly rather diverse) natural class [−Nasal]—and if we further insist rules be defined in terms of natural classes, and deny the possibility of disjunctive specification—we are still working in a binary setting, we just have added an additional stipulation that negated features cannot be referred to by rules.
  2. I put aside the issue of cumulativity of stress—a common critique in the early days—since nobody believes this is done by feature in 2023.

References

Sommerstein, A. 1977. Modern Phonology. Edward Arnold.

Use the minus sign for feature specifications

LaTeX has a dizzying number of options for different types of horizontal dash. The following are available:

  • A single - is a short dash appropriate for hyphenated compounds (like encoder-decoder).
  • A single dash in math mode,$-$, is a longer minus sign
  • A double -- is a longer “en-dash” appropriate for numerical ranges (like 3-5).
  • A triple --- is a long “em-dash” appropriate for interjections (like this—no, I mean like that).

My plea to linguists is to actually use math mode and the minus sign when they are writing binary features. If you want to turn this into a simple macro, you can please the following in your preamble:

\newcommand{feature}[2]{\ensuremath{#1}\textsc{#2}}

and then write \feature{-}{Back} for nicely formatted feature specifications.

Note that this issue has an exact parallel in Word and other WYSIWYG setups: there the issue is as simple as selecting the Unicode minus sign (U+2212) from the inventory of special characters (or just googling “Unicode minus sign” and copying and pasting what you find). 

A note on pure allophony

I have previously discussed the notion of pure allophony, contrasting it with the facts of alternations. What follows is a lightly edited section from my recent NAPhC 12 talk, which in part hinges on this notion.


While Halle (1959) famously dispenses with the structuralist distinction between phonemics and morphophonemics, some later generativists reject pure allophony outright. Let the phonemic inventory of some grammar G be P and the set of surface phones generated by G from P be S. If some phoneme p P always corresponds—in some to be made precise—to some phone s ∈ S and if s ∉ P then s is a pure allophone of p. For example, if /s/ is a phoneme and [ʃ] is not, but all [ʃ]s correspond to /s/s, then [ʃ] is a pure allophone of [s]. According to some descriptions, this is the case for Korean, as [ʃ] is a (pure) allophone of /s/ when followed by [i].

One might argue that alternations are more entrenched facts than pure allophony, simply because it is always possible to construct a grammar free of pure allophony. For instance, if one wants to do away with pure allophony one can derive the Korean word [ʃI] ‘poem’ from /ʃi/ rather than from /si/. One early attempt to rule out pure allophony—and thus to motivate the choice of /ʃi/ over /si/ for the this problem—is the alternation condition (Kiparsky 1968). As Kenstowicz & Kisseberth (1979:215) state it, this condition holds that “the UR of a morpheme may not contain a phoneme /x/ that is always realized phonetically as identical to the realization of some other phoneme /y/.” [Note here that /x, y/ are to be interpreted as variables rather than as the voiceless velar fricative or the front high round vowel.–KBG] Another recent version of this idea—often attributed to Dell (1973) or Stampe (1973)—is the notion of lexicon optimization (Prince & Smolensky 1993:192).

A correspondent to this list wonders why, in a grammar G such that G(a) = G(b) for potential input elements /a, b/, a nonalternating observed element [a] is not (sometimes, always, freely) lexically /b/. The correct answer is surely “why bother?”—i.e. to set up /b/ for [a] when /a/ will do […] The basic idea reappears as “lexicon optimization” in recent discussions. (Alan Prince, electronic discussion; cited in Hale & Reiss 2008:246)

Should grammars with pure allophony be permitted? The question is not, as is sometimes supposed, a purely philosophical one (see Hale & Reiss 2008:16-22): both linguists and infants acquiring language require a satisfactory answer. In my opinion, the burden of proof lies with those who would deny pure allophony. They must explain how the language acquisition device (LAD) either directly induces grammars that satisfy the alternation condition, or optimizes all pure allophony out of them after the fact. “Why bother” could go either way: why posit either complication to the LAD when pure allophony will do? The linguist faces a similar problem to the infant. To wit, I began this project assuming Latin glide formation was purely allophonic, and only later uncovered—subtle and rare—evidence for vowel-glide alternations. Thus in this study, I make no apology for—and draw no further attention to—the fact that some data are purely allophonic. This important question will have to be settled by other means.

References

Dell, F. 1973. Les règles et les sons. Hermann.
Hale, M, and Reiss, R.. 2008.
The Phonological Enterprise. Oxford University Press.
Halle, M. 1959. The Sound Pattern of Russian. Mouton.
Kenstowicz, M. and Kisseberth, C. 1979. Generative Phonology: Description and Theory. Academic Press.
Kiparsky. P. 1968. How Abstract is Phonology? Indiana University Linguistics Club.
Prince, A. and Smolensky, P. 1993. Optimality Theory: Constraint interaction in generative grammar. Technical Report TR-2, Rutgers University Center For Cognitive Science and Technical Report CU-CS-533-91, University of Colorado, Boulder Department of Computer Science.
Stampe, D. 1973. A Dissertation on Natural Phonology. Garland.

Defectivity in Amharic

[This is part of a series of defectivity case studies.]

According to Sande (2015), only Amharic verb stems that contain a geminate can form a frequentative. Since not all imperfect aspect verbs have geminates, some lack frequentatives and speakers must resort to periphrasis. If I understand the data correctly, it appears that the frequentative is a /Ca-/ reduplicant template which docks to the immediate left of the first geminate; the C (consonant) slot takes its value from said geminate. For instance, for the perfect verb [ˈsäb.bärä] ‘he broke’, the frequentative is [sä.ˈbab.bärä] ‘he broke repeatedly’. But there is no corresponding frequentative for the imperfective verb [ˈjə.säb(ə)r] ‘he breaks’ since there is no geminate to dock the reduplicant against; Sande marks as ungrammatical *[jə.sä.ˈbab(ə)r] and presumably other options are out too.

(h/t: Heather Newell)

References

Sande, H. 2015. Amharic infixing reduplication: support for a stratal approach to morphophonology. Talk presented at NELS 46.

More than one rule

[Leaving this as a note to myself to circle back.]

I’m just going to say it: some “rules” are probably two or three rules, because the idea that rules are defined by natural classes (and thus free of disjunctions) is more entrenched than our intuitions about whether or not a process in some language is really one rule or not, and we should be Gallilean about this. Here are some phonological “rules” that are probably two or three rules different rules.

  • Indo-Iranian, Balto-Slavic families, and Albanian “ruki” (environment: preceding {w, j, k, r}): it is not clear to me if any of the languages (the) actually need this as a synchronic rule at all.
  • Breton voiced stop lenition (change: /b/ to [v], /d/ to [z], /g/ to [x]): the devoicing of /g/ must be a separate rule. Hat tip: Richard Sproat. I believe there’s a parallel set of processes in German.
  • Lamba patalatalization (change: /k/ to [tʃ], /s/ to [ʃ]): two rules, possibly with a Duke-of-York thing. Hat tip: Charles Reiss.
  • Mid-Atlantic (e.g., Philadelphia) English ae-tensing (environment: following tautosyllabic, same-stem {m, n, f, θ, s, ʃ]): let’s assume this is allophony; then the anterior nasal and voiceless fricative cases should be separate rules. It is possible the incipient restructuring of this as having a simple [+nasal] context provides evidence for the multi-rule analysis.
  • Latin glide formation (environment: complex). Front and back glides are formed from high short monophthongs in different but partially overlapping contexts.

Feature maximization and phonotactics

[This is a quick writing exercise for in-progress work with Charles Reiss. Sorry if it doesn’t make sense out of context.]

An anonymous reviewer asks:

I wonder how the author(s) would reconcile this learning model with the evidence that both children and adults seem to aggressively generalize phonotactic restrictions from limited data (e.g. just [p]) to larger, unobserved natural classes (e.g. [p f b v]). See e.g. the discussion in Linzen & Gallagher (2017). If those results are credible, they seem much more consistent with learning minimal feature specifications for natural classes than learning maximal ones.

First, note that Linzen & Gallagher’s study is a study of phonotactic learning, whereas our proposal concerns induction of phonological rules. We have been, independently but complementarily, quite critical of the naïve assumptions inherent in prior work on this topic (e.g., Gorman 2013, ch. 2; Reiss 2017, §6); we have both argued that knowledge of phonotactic generalizations may require much less grammatical knowledge than is generally believed.

Secondly, we note that Linzen & Gallagher’s subjects are (presumably; they were recruited on Mechanical Turk and were paid $0.65 USD for their efforts) adults briefly exposed to an artificial language. While we recognize that adult “artificial language learning” studies are common practice in psycholinguistics, it is not clear what such studies contribute to our understanding of phonotactic acqusition (whatever the phonotactic acquirenda turn out to be) by children robustly exposed to realistic languages in situ.

Third, the reviewer is incorrect; the result reported by Linzen & Gallagher (henceforth L&G) is not consistent with minimal generalization. Let us grant—for sake of argument—that our proposal about rule induction in children is relevant to their work on rapid phonotactic learning in adults. One hypothesis they entertain is that their participants will construct “minimal classes”:

For example, when acquiring the phonotactics of English, learners may first learn that both [b] and [g] are valid onsets for English syllables before they can generalize to other voiced stops (e.g., [d]). This generalization will be restricted to the minimal class that contained the attested onsets (i.e., voiced stops), at least until a voiceless stop onset is encountered.

If by a “minimal class” L&G are referring to a natural class which is consistent with the data and has an extension with the fewest members, then presumably they would endorse our proposal of feature maximization, since the class that satisfies this definition is the most fully specified empirically adequate class. However, it is an open question whether or not such a class would actually contain [d]. For instance, if one assumes that major place features are bivalent, then the intersection of the features associated with [b, g] will contain the specification [−coronal], which rules out [d].

Interestingly, the matter is similarly unclear if we interpret “minimal class” intensionally, in terms of the number of features, rather than in terms of the number of phonemes the class picks out. The (featurewise-)minimal specification for a single phone (as in the reviewer’s example) is the empty set, which would (it is generally assumed) pick out any segment. Then, we would expect that any generalization which held of [p], as in the reviewer’s example, to generalize not just to other labial obstruents (as the reviewer suggests), but to any segment at all. Minimal feature specification cannot yield a generalization from [p] to any proper subset of segments, contra the anonymous reviewer and L&G. An adequate minimal specification which picks out [p] will pick out just [p].; L&G suggest that maximum entropy models of phonotactic knowledge may have this property, but do not provide a demonstration of this for any particular implementation of these models.

We thank the anonymous reviewer for drawing our attention to this study and the opportunity their comment has given us to clarify the scope of our proposal and to draw attention to a defect in L&G’s argumentation.

References

Gorman, K. 2013. Generative phonotactics. Doctoral dissertation, University of Pennsylvania.
Linzen, T., and Gallagher, G. 2017. Rapid generalization in phonotactic learning. Laboratory Phonology: Journal of the Association for Laboratory Phonology 8(1): 1-32.
Reiss, C. 2017. Substance free phonology. In S.J. Hannahs and A. Bosch (ed.), The Routledge Handbook of Phonological Theory, pages 425-452. Routledge.

Codon math

It well-known that there are twenty “proteinogenic” amino acids—those capable of creating proteins—in eukaryotes (i.e., lifeforms with nucleated cells). When biologists first began to realize that DNA synthesizes RNA, which synthesizes amino acids, it was not yet known how many DNA bases (the vocabulary being A, T, C, and G) were required to code an animo acid. It turns out the answer is three: each codon is a base triple, each corresponding to an amino acid. However, one might have deduced that answer ahead of time using some basic algebra, as did Soviet-American polymath George Gamow. Given that one needs at least 20 aminos (and admitting that some redundancy is not impossible), it should be clear that pairs of bases will not suffice to uniquely identify the different animos: 42 = 16, which is less than 20 (+ some epsilon). However, triples will more than suffice: 43 = 64. This holds assuming that the codons are interpreted consistently independently of their context (as Gamow correctly deduced) and whether or not the triplets are interpreted as overlapping or not (Gamow incorrectly guessed that they overlapped, so that a six-base sequence contains four triplet codons; in fact it contains no more than two).

All of this is a long way to link back to the idea of counting entities in phonology.  It seems to me we can ask just how many features might be necessary to mark all the distinctions needed. At the same time, Matamoros & Reiss (2016), for instance, following some broader work by Gallistel & King (2009), take it as desirable that a cognitive theory involve a small number of initial entities that give rise to a combinatoric explosion that, at the etic level, is “essentially infinite”. Surely similar thinking can be applied throughout linguistics.

References

Gallistel, C. R., and King, A. P.. 2009. Memory and the Computational
Brain: Why Cognitive Science Will Transform Neuroscience. Wiley-Blackwell.
Matamoros, C. and Reiss, C. 2016. Symbol taxonomy in biophonology. In A. M. Di Sciullo (ed.), Biolinguistic Investigations on the Language Faculty, pages 41-54. John Benjmanins Publishing Company.