Deriving the major rule/minor rule distinction

The ability to target underspecified lexemes’ specifications for a rule feature, in which feature-filling is implemented by unification (e.g., Bale et al. 2014), ought to enable us to derive the traditional distinction (e.g., Lakoff 1970) between major rules (those for which non-application is exceptional) and minor rules (those for which application is exceptional), making this distinction purely descriptive of later feature-filling rules inserting unmarked rule features upon lexical insertion.

Let us suppose we have a rule R. Let us suppose that every formative is unified with {+R} upon lexical insertion. Then, unification will fail only with formatives specified [−R], and these formatives will exhibit exceptional non-application. This describes the parade example of exceptions to a major rule: the failure of trisyllabic shortening in obesity (assuming obese is [−trisyllabic shortening]; see Chomsky & Halle 1968: §4.2.2).

Let us suppose instead that every formative is unified with {−R} upon lexical insertion. Then, unification will fail only with those formatives specified [+R], and these formatives will exhibit exceptional application, assuming they otherwise satisfy the phonological description of rule R. This describes minor rules.

This (admittedly quite sketchy at present) idea seems to address Zonneveld’s (1978: 160f.) concern that Lakoff and contemporaries did not posit any way to encode whether or not a rule was major or minor, except “transderivationally” via inspection of successful derivations. This also places the major/minor distinction—correctly, I think—in the scope of theory of productivity. More on this later.

References

Bale, A., Papillon, M., and Reiss, C. 2014. Targeting underspecified segments: a formal analysis of feature-changing and feature-filling rules. Lingua 148: 240-253.
Chomsky, N. and Halle, M. 1968. Sound Pattern of English. Harper & Row.
Lakoff, G. 1970. Irregularity in Syntax. Holt, Rinehart and Winston.
Zonneveld, W. 1978. A Formal Theory of Exceptions in Generative Phonology. Peter de Ridder.

Linguistics’ contribution to speech & language processing

How does linguistics contribute to speech & language processing? While there exist some “linguist eliminationists”, who wish to process speech audio or text “from scratch” without intermediate linguistic representations, it is generally recognized that linguistic representations are the end goal of many processing “tasks”. Of course some tasks involve poorly-defined, or ill-posed, end-state representations—the detection of hate speech and named entities, neither of which are particularly well-defined, linguistically or otherwise, come to mind—but are driven by apparent business value to be extracted rather than serious goals to understand speech or text.

The standard example for this kind of argument is syntax. It might be the case that syntactic representations are not as useful for textual understanding as was anticipated, and useful features for downstream machine learning can apparently be induced using far simpler approaches, like the masked language modeling task used for pre-training in many neural models. But it’s not as if a terrorist cell of rogue linguists locked NLP researchers in their office until they developed the field of natural language parsing. NLP researchers decided, of their own volition, to spend the last thirty years building models which could recover natural language syntax, and ultimately got pretty good at it, probably getting up to the point where, I suspect, unresolved ambiguities mostly hinge on world knowledge that is rarely if ever made explicit.

Let us consider another example, less widely discussed: the phoneme. The phoneme was discovered in the late 19th century by Baudouin de Courtenay and Kruszewski. It has been around a very long time. In the century and a half since it emerged from the Polish academy, Poland itself has been a congress, a kingdom, a military dictatorship, and a republic (three times), and annexed by the Russian empire, the German Reich, and the Soviet Union. The phoneme is probably here to stay. The phoneme is, by any reasonable account, one of the most successful scientific abstractions in the history of science.

It is no surprise then, that the phoneme plays a major role in speech technologies. Not only did the first speech recognizers and synthesizers make explicit use of phonemic representations (as well as notions like allophones), so did the next five decades worth of recognizers and synthesizers. Conventional recognizers and synthesizers require large pronunciation lexicons mapping between orthographic and phonemic form, and as they get closer to speech, convert these “context-independent” representations of phonemic sequences onto “context-dependent” representations which can account for allophony and local coarticulation, exactly as any linguist would expect. It is only in the last few years that it has even become possible to build a reasonably effective recognizer or synthesizer which doesn’t have an explicit phonemic level of representation. Such models instead use clever tricks and enormous amounts of data to induce implicit phonemic representations instead. We have every reason to suspect these implicit representations are quite similar to the explicit ones linguists would posit. For one, these implicit representations are keyed to orthographic characters, and as I wrote a month ago, “the linguistic analysis underlying a writing system may be quite naïve but may also encode sophisticated phonemic and/or morphemic insights.” If anything, that’s too weak: in most writing systems I’m aware of, the writing system is either a precise phonemic analysis (possibly omitting a few details of low functional load, or using digraphs to get around limitations of the alphabet of choice) or a precise morphophonemic analysis (ditto). For Sapir (1925, et. seq.) this was key evidence for the existence of phonemes! So whether or not implicit “phonemes” are better than explicit ones, speech technologists have converged on the same rational, mentalistic notions discovered by Polish linguists a century and a half ago.

So it is surprising to me that even those schooled in the art of speech processing view the contribution of linguistics to the field in a somewhat negative light. For instance, Paul Taylor, the founder of the TTS firm Phonetic Arts, published a Cambridge University Press textbook on TTS methods in 2009, and while it’s by now quite out of date, there’s no more-recent work of comparable breadth. Taylor spends the first five hundred (!) pages or so talking about linguistic phenomena like phonemes, allophones, prosodic phrases, and pitch accents—at the time, the state of the art in synthesis made use of explicit phonological representations—so it is genuinely a shock to me that Taylor chose to close the book with a chapter (Taylor 2009: ch. 18) about the irrelevance of linguistics. Here are a few choice quotes, with my commentary.

It is widely acknowledged that researchers in the field of speech technology and linguistics do not in general work together. (p. 533)

It may be “acknowledged”, but I don’t think it has ever been true. The number of linguists and linguistically-trained engineers working on FAANG speech products every day is huge. (Modern corporate “AI” is to a great degree just other people, mostly contractors in the Global South.) Taylor continues:

The first stated reason for this gap is the “aeroplanes don’t flap their wings” argument. The implication of this statement is that, even if we had a complete knowledge of how human language worked, it would not help us greatly because we are trying to develop these processes in machines, which have a fundamentally different architecture. (p. 533)

I do not expect that linguistics will provide deep insights about how to build TTS systems, but it clearly identified the relevant representational units for building such systems many decades ahead of time, just as mechanics provided the basis for mechanical engineering. This was true of Kempelen’s speaking machine (which predates phonemic theory, and so had to discover something like it) and Dudley’s voder as well as speech synthesizers in the digital age. So I guess I kind of think that speech synthesizers do flap their wings: parametric, unit selection, hybrid, and neural synthesizers are all big fat phoneme-realization machines. As is standard practice in physical sciences, the simple elementary particles of phonological theory—phonemes, and perhaps features—were discovered quite early on, but it the study of their onotology has taken up the intervening decades. And unlike the physical sciences, us cognitive scientists some day must also understand their epistemology (what Chomsky calls “Plato’s problem”) and ultimately, their evolutionary history (“Darwin’s problem”) too. Taylor, as an engineer, need not worry himself about these further studies, but I think he is being widely uncharitable about the nature of what he’s studying, or the business value of having a well-defined hypothesis space of representations for his team to engineer around in.

Taylor’s argument wouldn’t be complete without a caricature of the generative enterprise:

The most-famous camp of all is the Chomskian [sic] camp, started of course by Noam Chomsky, which advocates a very particular approach. Here data are not used in any explicit sense, quantitative experiments are not performed and little stress is put on explicit description of the theories advocated. (p. 534)

This is nonsense. Linguistic examples are data, in some cases better data than results from corpora or behavioral studies, as the work of Sprouse and colleagues has shown. No era of generativism was actively hostile to behavioral results; as early as the ’60s, generativist-aligned psycholinguists were experimentally testing the derivational theory of complexity and studying morphological decomposition in the lab. And I simply have never found that generativist theorizing lacks for formal explicitness; in phonology, for instance, the major alternative to generativist thinking is exemplar theory—which isn’t even explicit enough to be wrong—and a sort of neo-connectionism—which ought not to work at all given extensive proof-theoretic studies of formal learnability and the formal properties of stochastic gradient descent and backpropagation. Taylor continues to suggest that the “curse of dimensionality” and issues of generalizability prevent application of linguistic theory. Once again, though, the things we’re trying to represent are linguistic notions: machine learning using “features” or “phonemes”, explicit or implicit, is still linguistics.

Taylor concludes with some future predictions about how he hopes TTS research will evolve. His first is that textual analysis techniques from NLP will become increasingly important. Here the future has been kind to him: they are, but as the work of Sproat and colleagues has shown, we remain quite dependent on linguistic expertise—of a rather different and less abstract sort than the notion of the phoneme—to develop these systems.

References

Sapir, E. 1925. Sound patterns in language. Language 1:37-51.
Taylor, P. 2009. Text-to-Speech Synthesis. Cambridge University Press.

“Natural language processing” is not a proper name

The phrase natural language processing is not a proper name, so there’s no reason for it to be written in titlecase: it should be lowercase, like any other common noun phrase.

Defectivity in Turkish; part 1: monosyllables

[This is part of a small but growing series of defectivity case studies.]

While there are some languages—like Greek or Russian—in which there are dozens or even hundreds of defective lexemes, in most cases defectivity is markedly constrained, conditioned by both morphological class or status and lexical identity. This is somewhat in conflict with models which view defectivity as essentially “absolute phonotactic ungrammaticality” (e.g., Orgun & Sprouse 1999; henceforth OS), since the generalizations about which items are or are not defective are not primarily phonotactic. A good demonstration of the morphological-lexical nature of defectivity comes from Turkish.

As first reported (to my knowledge) by Itô & Hankamer (1989; henceforth IH) Turkish has just a small number of monosyllabic stems. In verbs, one forms the “simple” (active) imperative using the bare stem: e.g., ye ‘eat!’. However, one cannot form a passive imperative of monosyllabic verbs. For instance, for EAT, we would expect *yen (with -n being the expected allomorph of the passive imperative), but this is apparently ill-formed under the appropriate interpretation, with no obvious alternative.¹ I say it this way because because yen exists as is the simple imperative ‘conquer!’. As IH note, this shows there is nothing phonotactically wrong with the ill-formed passive imperatives. Another example they give is kon ‘alight! (like a bird)’. Apparently, we would expect it to have a passive imperative homophonous with the simple imperative, but it is ill-formed under this interpretation. However, I find these two examples less than convincing since one could imagine that the homophony with another type of imperative might be implicated in these judgments.

Something similar characterizes certain monosyllabic noun stems. Turkish has apparently borrowed the seven solfège syllables do, re, mi, etc. Of these, six are CV monosyllables, which we would expect to select the /-m/ allomorph of the 1sg. poss. suffix. However, these 1sg. poss. forms are apparently ill-formed; e.g., *do-m ‘my do‘, re-m ‘my re‘, *mi-m ‘my mi‘, and so on. However, one can use these with the other declensional suffixes which produce polysyllabic outputs; e.g., 1pl. poss. domuz ‘our do’. The same facts are true for the names of the letters of the (Latin, post-1928) alphabet: e.g., de ‘the letter d’, but *de-m ‘my letter d’, and so on. OS report however that the one CVC solfège syllable, sol, has a well-formed 1sg. poss.; this selects the /-Im/ allomorph (where /I/ is a high vocalic archiphoneme subject to stem-controlled vowel harmony), which gives us the licit 1sg. poss. solüm [solʲym] ‘my sol‘.² The same facts hold of the 2sg. poss. ‘your __’, which for CV monosyllables would be realized as /-n/; e.g., *do-n ‘your do‘.

From the above facts IH and OS conclude there is an exceptionless constraint in Turkish such that monosyllabic derived forms produced by the grammar are ill-formed, with no possible “repair”. However, Selin Alkan (p.c.) draws my attention to at least one CV nominal stem which is well-formed in the 1sg. and 2sg,. poss.: su ‘water. For this stem, a [j] glide is inserted between the vowel and the stem, and the stem selects for the -VC allomorphs of the possessive; e.g., su-y-um ‘my water’, su-y-un ‘their water’. This is surprisingly insofar as OS take pains (p. 195f.) to specifically rule out repair by epenthesis in 1sg. poss. forms!

It would be nice to conclude that the only affected lexemes are transparent borrowings, but this does not seem to accord with the evidence from monosyllabic verbs. But the evidence from native stems is really quite weak, and the generalizations are clearly morphological (i.e., the restriction of the constraint to derived environments) and lexical (i.e., the fact that su has an “escape hatch”), something that has largely been ignored in previous attempts to describe defectivity in Turkish.

To move forward on this topic, it would be nice to know the following. How many, if any, verbs behave like ye or kon, and are any unexpectedly well-formed in the passive imperative? Are there any other forms in the verbal paradigm that show “monosyllabism” gaps? Similarly, how many (if any) defective nouns are there beyond those already mentioned, and how many behave like su?

[h/t: Selin Alkan]

Endnotes

IH note (p. 61, fn. 1) that the passive imperative “are somewhat odd in normal circumstances”. Therefore, they asked their informants to imagine they were directors giving instructions to actors, which apparently helped to render these examples more felicitious.
It seems plausible that -m and -n here are purely-phonological allomorphs of /-Im, -In/ respectively, but I am not sure.

References

Itô, J. and Hankamer, J. 1989. Notes on monosyllabism in Turkish. In J. Itô and J. Runner (ed.), Phonology at Santa Cruz, pages 61-69. Linguistics Research Center, University of California, Santa Cruz.
Orgun, C. O. and Sprouse, R. L. 1999. From MPARSE to CONTROL: deriving ungrammaticality. Phonology 16:191-224.

“Python” is a proper name

In just the last few days I’ve seen a half dozen instances of the phrase python package or python script in published academic work. It’s disappointing to me that this got by the reviewers, action editors, and copy editors, since Python is obviously a proper name and should be in titlecase. (The fact that the interpreter command is python is irrelevant.)

“…phonology is logically (and causally) prior to phonetics.”

Two important consequences follow from this. First, that phonology is logically (and causally) prior to phonetics as here defined. Second, phonology is also epistemologically prior to phonetics. Judgments about phonetic events are invariably made in terms of perceptual phonology. (Hammarberg 1976:356)

In this post I’d like to briefly review a view of the relationship between phonetics and phonology as related by Hammarberg (1976) and Appelbaum (1996), the former being primarily concerned with production and the latter with perception.

Phonetics, being concerned with the material and physical, has tended to align itself with the physical sciences (and physics in particular), and with the empiricist tradition in science.^1,2 In contrast, much of what has been called the cognitive revolution in the cognitive sciences, and in linguistics in particular, is explicitly anti-empiricist. As Hammarberg and Appelbaum argue, the empiricist biases of phonetics make it ill-suited to explain fundamental facts about speech.

It is generally understood that spoken language is not produced as a discrete sequence but rather a series of overlapping gestures and acoustic signatures. Anyone who has looked closely at the acoustics of speech will already recognize that it is impossible to say exactly where, in a word like cat, the [æ]-ness ends and the [t]-ness begins. In a worrd like soon, the fricative portion shows signs of rounding not found in words like scene. From an acoustic record alone, one cannot determine empirically how many segments are present. And, one cannot produce natural-sounding synthesized speech via simple concatenation of segments. It is not just that the [æ, t, s] and other segments are coarticulated with nearby segments, however: it is also the case that there are simply no invariant acoustic-phonetic properties that uniquely characterize [t]. A [t] spoken by a child, by a man with a mouth full of chili, by a woman missing her front teeth, and so on may have radically different acoustic properties, yet we as scientists understand them to be in some sense identical phenomena.

This is a basic principle of scientific discovery: one must assume that “the vast multitude of phenomena he encounters may be accounted for in terms of the interactions of a fairly small number of basic entities, standard elementary individuals. His task thus becomes one of identifying the basic entities and describing the interactions in virtue of which the encountered phenomena are generated. From this emerge our…notions of the identity and nonidentity of phenomena.” (Hammarberg, p. 354) The linguistic notion of segment is perhaps the most important of these basic entities. It is an entity recognized both by those early lay-linguists, the Iron Age scribes who gave us the alphabet, as well as one of the most venerable notions in the history of modern linguistics. Yet, segments do not have a physical reality of their own; they do not exist in the physical world, but only in the human mind. They are “internally generated, the creature of some kind of perceptual-cognitive process.”

It is generally uncontroversial to speak of the output of the phonological component as the input to the phonetic component. From this it follows that phonology is cognitively and epistemically prior to phonetics. Coarticulation, for instance, results because of the process which maps segments—which, remember, exist only in the mind of speakers—onto articulatory and acoustic events. But one cannot talk about coarticulation without segments, since it is the spreading of articulatory-acoustic properties between segments that defines coarticulation. One must know that /s/ exists, and has an inherent properties not normally associated with—or compatible with—lip rounding to even observe the anticipatory lip rounding in words like soon.

The existence of coarticulation is often understood teleologically, in the sense that is taken to be in part mechanical, automatic, inertial. This too is a mistake, according to Hammarberg: apparent teleological explanations of human behavior should be recast, as is the tradition in Western philosophy, as the result of intentional, causal behavior. The existence of anticipatory articulation shows us that the influence of the /u/ in soon has on the realization of the preceding /s/ occurred some time before instructions to the articulators were generated, and the level at which this influence occurs should therefore be identified with the mental rather than the physiological. Hammarberg continues to argue that coarticulatory processes are akin to ordinary allophony and should reside in the scope of phonological theory. This argument is strengthened insofar as coarticulation has a language-specific character, as is sometimes claimed.

Appelbaum, while not citing Hammarberg’s original paper, extends this critique to the theory of speech perception. It is an assumption of the so-called motor theory that there are invariant properties which identify “phonetic gestures”. Since the motor theorists do not present any evidence that such invariants soc much as exist, we instead must be abstract out into mental entities which have all the properties of—and which Appelbaum identifies with—what we are calling segments, or perhaps lower-level entities like phonological features. Under this approach, then, there is no content to the motor theory of speech perception beyond the obvious point that phonetic experience, somehow, turns into purely mental representations. Again, the empiricist biases of phonetics have lead us astray.

The above discussion may influence the way we think about the role of phonetics in linguistics education. Phonetics is generally viewed as its own autonomous subdiscipline, and modern acoustic and articulatory analysis is certainly complex enough to justify serious graduate instruction, but it would seem to suggest that phonetic tools exist primarily as a way of gathering phonological information rather than as an autonomous discipline. I am not sure I am ready to conclude that, but it certainly is provocative!

Endnotes

Empiricism refers to a theory of epistemology and should not be confused with the empirical method in science (the use of sense-based observation). Many prominent thinkers reject empiricism in favor of rationalism, but support the use of empirical methods. No one is seriously arguing against the use of the senses.
This will be shown to be yet another example of physics envy as the source of sloppy thinking in linguistics.

References

Appelbaum, I. 1996. The lack of invariance problem and the goal of speech perception. In Proceeding of Fourth International Conference on Spoken Language Processing, 1541-1544.
Hammarberg, R. 1976. The metaphysics of coarticulation. Journal of Phonetics 4: 353-363.

Noam on phonotactics

(Emphasis mine.)

Take the question of sound structure. Here too the person who has acquired knowledge of a language has quite specific knowledge about the facts that transcend his or her experience, for example, about which nonexistent words are possible words and which are not. Consider the forms strid and bnid. Speakers of English have not heard either of these forms, but they know that strid is a possible word, perhaps the name of some exotic fruit they have not seen before, but bnid, though pronounceable, is not a possible word of the language. Speakers of Arabic, in contrast, know that bnid is a possible word and strid is not; speakers of Spanish known that neither strid nor bnid is a possible word of their language. The facts can be explained in terms of rules of sound structure that the language learner comes to know in the course of acquiring the language.

Acquisition of the rules of sound structure, in turn, depends on fixed principles governing possible sound systems for human languages, the elemnts of which they are constituted, the manner of their combination and the modifications that they may undergo in various contexts. These principles are common to English, Arabic, Spanish, and all other human languages and are used unconsciously by a person acquiring any of these languages…

Suppose one were to argue that the knowledge of possible words is derived “by analogy.” The explanation is empty until an account is given of this notion. If we attempt to develop a concept of “analogy” that will account for these facts, we will discover that we are building into this notion the rules and principles of sound structure. (Chomsky 1988:26)

References

Chomsky, N. 1988. Language and Problems of Knowledge: the Managua Lectures. MIT Press.

Defectivity in Tagalog

[This is part of a small but growing series of defectivity case studies. Here I am well out of my linguistic comfort zone, working with a language I know very little about, so please take my comments cum salo granis.]

The behavior of the Tagalog actor focus (AF) infix (and occasionally, prefix) -um- has received an enormous amount of attention since the days of prosodic morphology. Schachter & Otanes (1972; henceforth SO), cited in Orgun & Sprouse (1999), claim that “-um- does not occur with bases beginning with /m/ or /w/” (p. 292). Presumably this statement means such bases are defective with respect to their actor focus form; that is certainly how Orgun & Sprouse—and most of the subsequent literature—has interpreted this. I am aware of one complication, however. First off, many verbs instead use the prefix mag– to mark actor focus; SO could simply be making a distributional statement about two allomorphs of the actor focus marker. As I understand it, whether a verb takes -um-, mag-, or both is conditioned by verb semantics, whether or not the verb is derived or a bare root, whether or not the verb is borrowed or not, and so on, and there is probably some regional, register, and individual variation too.¹ And there are other focus markers beyond -um- and mag-.

Orgun & Sprouse (henceforth OS) provide just a few examples (p. 206). According to them, there is no AF form *mumahal ‘to become expensive’ (< mahal). It’s not really clear what we ought to reason from *mumahal. First, do all adjectives have a corresponding AF verb form? Secondly, one might ask whether magmahal is the AF form of this adjective. According to Wiktionary, it is, so this is probably just an instance of ordinary morphological blocking. Third, this is obviously a loanword, which might have something to do with its choice of AF affix and/or whether it participates in the AF system at all. Similarly, OS give the example *mumura ‘to become cheap’ (< mura), but Wiktionary says magmura exists and has the relevant reading. If this is correct, OS may have confused defectivity and blocking.

OS provide two other types of examples of what they call defectivity.

First, OS claim that /Cw…/-initial stems borrowed from English can form AF forms of the form /C-um-w…/ but not /Cw-um…/. Thus the AF infinitive sumwer (< Eng. swear) well-formed, but *swumer is not. It is not clear this generalization is correct, since Ross (1996) elicits the AF infinitive [twumɪtɘɾ] (< Eng. twitter; p. 15) from “a native speaker of Tagalog in her thirties who had recently come to Canada from Manila…” who was “asked to ‘borrow’ hypothetical English loanwords…” (p. 2).² OS do not discuss /Cm-…/-initial borrowings, and they give us no reason to suspect that /Cw…/- and /Cm…/-initial loanword stems would behave differently, but Ross also elicits [smumajl] (< Eng. smile; p. 15).

Secondly, OS claim that /m, w/-initial stems borrowed from English do not form AF verbs in -um-. I have not been able to find any of their examples in a Tagalog dictionary, so these may just be poorly assimilated loanwords.

OS note that there is no general restriction on homomorphemic /…mum…/ sequences in Tagalog, and they note that reduplication may also produce /…mum…/. Even if their description is correct, it is a mystery why this restriction holds only of a specific AF affix. But I suspect that OS have either misunderstood SO, or perhaps misgeneralized from SO’s admittedly vague comment.

Before I should conclude, I should note that the empirical situation for Tagalog linguistics is dire. The language has many tens of millions of speakers, and has long been of interest to linguists. There are extensive grammatical resources on Tagalog in English and Spanish. Yet any time I interact with Tagalog examples in the literature, I find data inconsistencies, analytical laziness, or both. As a student put it to me: “As a Filipina it feels disrespectful and offensive, and as a linguist it feels super shady and raises so many philosophy of science red flags.” There may be some relevant results in Zuraw 2007, which elicits a corpus of the AF forms of Tagalog loanwords, including forms in /Sm-…/, but I am unable to reconcile those findings with Ross 1996, despite the fact that Ross and Zuraw are the same person.

Endnotes

For roots that take both affixes, the two AF forms may or may not be synonymous. For example, pumunta and magpunta are roughly equivalent AF forms of ‘to go’. However, bumuli means ‘to buy’ whereas magbili means ‘to sell’.
Note that it was the ’90s, mannnnnn, so this is about songbirds; it has nothing to do with microblogging.

References

Orgun, C. O. and Sprouse, R. L. 1999. From MPARSE to CONTROL: deriving ungrammaticality. Phonology 16:191-224.
Ross, K. 1996. Floating phonotactics: variability in reduplication and infixation in Tagalog loanwords. Master’s thesis, University of California, Los Angeles.
Schachter, P. and Otanes, F. 1972. Tagalog Reference Grammar. University of California Press.
Zuraw, K. 2007. The role of phonetic knowledge in phonological patterning: corpus and survey evidence from Tagalog infixation. Language 83: 277-316.

Magic and productivity: Spanish metaphony

In Gorman & Yang 2019 (henceforth GY), we provide an analysis of metaphonic patterns in Spanish. This is just one of four or five case studies and it is a bit too brief to go into some interesting representational issues. In this post I’ll try to fill some of the missing details as I understand them, with the caveat that Charles does not necessarily endorse any of my proposals here.

The tolerance principle approach to productivity is somewhat unique in that it is not tied to any particular theory of rules or representations, so long as such theories provide a way to encode competing rules applying in order of decreasing specificity (Pāṇini’s principle or the elsewhere principle). Yet any particular tolerance analysis requires us to commit to a specific formal analysis of the phenomenon⁠—the relevant rules and the representations over which they operate—so that we know what to count. The way in which I apply the tolerance principle also presumes that productivity (e.g., as witnessed by child overregularization errors) or its lack (as witnessed by inflectional gaps) is a first-class empirical observation and that any explanatorily-adequate tolerance analysis ought to account for it. What this means to me is that the facts productivity can adjudicate between different formal analyses, as the following example shows.

The facts are these. A large percentage of Spanish verbs, all of which have a surface mid vowel (e or o) in the infinitive, exhibit alternations targeting the nucleus of the final syllable of the stem. In all three conjugations, one can find verbs in which this surface mid vowel diphthongizes to ie [je] or ue [we], respectively.¹Furthermore, in the third conjugation, there is a class of verbs in which the e in the final syllable of certain forms alternates with an i.²

The issue, of course, is that there are verbs which are almost identical to the diphthongizing or e–i stems but which do not undergo these alternations (GY:178f.). One can of course deny that magic is operating here, but this does not seem workable.³ We need therefore to identify the type of magic: the rules and representations involved.

There is some reason to think that conjugation class is relevant to these verb stem alternations. For example, Mayol et al. (2007) analyzes verb stem errors in a sample of six children acquiring Spanish, a corpus of roughly 2,000 verb tokens. Nearly all errors in this corpus involve underapplication of diphthongization to diphthongizing verbs in the first and second conjugation; errors in the third conjugation are extremely rare. Secondly the e-i alternations are limited to the third conjugation. As Harris (1969:111) points out, the e form surfaces only when the stem is followed by an i in the first syllable of the desinence. This suggests that the alternation is a lowering rather than a raising one, and explains why this pattern is confined to the third (-i-) conjugation. Finally, there are about a dozen Spanish verbs, all of the third conjugation, which are defective in exactly those inflectional forms—those in which there is either stress on the stem or those in which the stem is followed by a desinential /i/ in the following syllable—which would reveal to us whether the stem is diphthongization or lowering. These three facts seem to be telling us that these alternations are sensitive to conjugation class.

Jim Harris has long argued for an abstract phoneme analysis of Spanish diphthongization. In Harris 1969, diphthongization reflect abstract phonemes, present underlyingly, denoted /E, O/; no featural decomposition is provided, but one could imagine that they are underspecified for some features related to height. Harris (1985) instead supposes that the vowels which undergo diphthongization under stress bear two skeletal “x” slots, one linked and one unlinked, as follows.

o
|
X X

This distinguishes them from ordinary non-alternating mid vowels (which only have one “x”) and non-alternating diphthongs (which are prelinked to two “x”s). Harris argues this also provides explanation for why stress conditions this alternation.

One interesting property of Harris’ account, one which I do not believe has been remarked on before, it is that it seems to rule out the idea that diphthongization vs. non-diphthongization is “governed by the grammar”: it is purely a fact of lexical representation and surface forms follow directly from applying the rules to the abstract phonemic forms. To put it more fancifully, there is no “daemon” inside the phonemic storage unit of the lexicon deciding where the diphthongs or lowering vowels go; such facts are of interest for “evolutionary” theorizing, but are accidents of diachrony.

However, I believe the facts of productivity and the conditioning effects of conjugation support an alternative—and arguably more traditional—analysis, in which diphthongization and lowering are governed by abstract diacritics at the root level, in the form of rule features of the sort proposed by Kisseberth (1970) and Lakoff (1970).

I propose that verbs with mid vowel in the final syllable of their stem which do not undergo diphthongization, like pegar ‘to stick to’; (e.g., pego ‘I stick to’), are marked [−diph], and those which do undergo diphthongization, like negar ‘to deny’ (niego ‘I deny’) are marked [+diph]; both are assumed to have an /e/ in underlying form. Similarly, I propose that verbs which undergo lowering, like pedir ‘to ask for’ (e.g., pido ‘I ask for’), are specified [+lowering] and non-lowering verbs, like vivir ‘to live’ (vivo ‘I live), are specified [−lowering]; both have an underlyingly /i/. Then, the rule of lowering is

Lowering: i -> e / __ C_0 i

or, in prose, an /i/ lowers to /e/ when followed by zero or more consonants and a /i/. I assume a convention of rule application such that rule R can apply only to those /i/s which are part of a root marked [+R]; it is as if there is an implicit [+R] specification on the rule’s target. Therefore, the rule of lowering does not apply to vivir. This rule feature convention is assumed to apply to all phonological rules, including diphthongization.

I furthermore propose that [diph] and [lowering] rule features are inserted during the derivation according to GY’s tolerance analysis. For first (-a-) and second (-e-) conjugation verbs, [−diph] is the default and [+diph] is lexically conditioned.

[] -> [+diph] / __ {√neg-, ...}
   -> [-diph] / __

For third (-i-) conjugation verbs, I assume that there is no default specification for either rule feature.

[] -> [+lowering] / __ {√ped-, ...}
[] -> [-lowering] / __ {√viv-, ...}

I have not yet provided formal machinery to limit these generalizations to the particular conjugations, but I wish to stay agnostic about morphological theory and so I assume that any adequate model of the morphophonological interface ought to be able to encode conjugation class-specific generalizations like the above.

I leave open the question as to how roots which fail to satisfy the phonological conditions for lowering (like those which do not contain a final-syllable /i/) or diphthongization (like those which do not contain a final-syllable mid vowel) are specified with respect to the [diph] and [lowering] features. I am inclined to say that they remain underspecified for these features throughout the derivation. However, all that is essential here is that such roots are not in scope for the tolerance computation.

Let us suppose that we wish to encode, synchronically, phonological “trends” in the lexicon with respect to the distribution of diphthongizing and/or lowering verbs, such as Bybee & Pardo’s claim that e–ie diphthongization is facilitated when followed by the trill rr. Such observations could be encoded at the point in which rule features are inserted, if desired. It is unclear how a similar effect might be achieved under the abstract phoneme analysis. I remain agnostic on this question, which may ultimately bear on the past tense debate.

In future work (if blogging can be called “work”), it would be interesting to expand the proposal to other cases of morpholexical behavior studied by Kisseberth (1970), Lakoff (1970), and Zonneveld (1978), among others. Yet my proposal does not entail that we draw similar conclusions for all superficially similar case studies. For instance, I am unaware at present of evidence contradicting Rubach’s (2016) arguments that the Polish yers are abstract phonemes.

Endnotes

Let us assume, as does Harris, that the appearance of the [e] in both diphthongs is the result of a default insertion rule applying after diphthongization converts the nucleus to the corresponding glide.
This of course does not exhaust the set of verbal alternations, as there are highly-irregular consonantal and vocalic alternations in a handful of other verbs.
Albright et al. (2001) and Bybee & Pardo (1981) are sometimes understood to have found solid evidence for a “non-magical” analysis, in which the local context in which a stem mid vowel is found is the sole determinant. This is a massive overinterpretation. Bybee & Pardo identify some local contexts which seem to favor diphthongization, and the results of a small nonce word cloze task are consistent with these findings. Albright et al. use a simple computational model to discover some contexts which seem to favor diphthongization, and find that subjects’ ratings of possible nonce words (on a seven-point Likert scale) are correlated with the models’ predictions for diphthongization. Schütze (2005) gives a withering critique of the general nonce word rating approach. Even ignoring this, neither study links nonce word tasks in adult knowledge of, or child acquisition of, actual Spanish words.

References

Albright, A., Andrade, A., and Hayes, B. 2001. Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics 7: 117-151.
Baković, E., Heinz, J., and Rawski, J. In press. Phonological abstractness in the mental lexicon. In The Oxford Handbook of the Mental Lexicon, to appear.
Bale, A., and Reiss, C. 2018. Phonology: a Formal Introduction. MIT Press.
Bybee, J., and Pardo, E. 1981. Morphological and lexical conditioning of rules: experimental evidence from Spanish. Linguistics 19: 937-968.
Gorman, K. and Yang, C. 2019. When nobody wins. In F. Rainer, F. Gardani, H. C. Luschützky and W. U. Dressler (ed.), Competition in Inflection and Word Formation, 169-193. Springer.
Harris, J. 1969. Spanish Phonology. MIT Press.
Harris, J. 1985. Spanish diphthongisation and stress: a paradox resolved. Phonology Yearbook 2:31-45.
Lakoff, G. 1970. Irregularity in Syntax. Holt, Rinehart and Winston.
Kisseberth, C. W. 1970. The treatment of exceptions. Papers in Linguistics 2:44-58.
Mayol, Laia. 2007. Acquisition of irregular patterns in Spanish verbal morphology. In Proceedings of the Twelfth ESSLLI Student Session, 1-11.
Schütze, C. 2005. Thinking about what we are asking speakers to do. In S. Kepser and M. Reis (ed.), Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives, pages 457-485. Mouton de Gruyter.
Zonneveld, W. 1978. A Formal Theory of Exceptions in Generative Phonology. Peter de Ridder.