Pynini 2020: State of the Sandwich

I have been meaning to describe some of the work I have been doing on Pynini, our weighted finite-state grammar development platform. For one, while I have been the primary contributor through the history of the project (Richard Sproat wrote the excellent path iteration library), we are now also getting many contributions from Lawrence Wolf-Sonkin (rewrite of the symbol table wrapper, type hints) and lots of usability and bug reports from the Google linguists.

We are currently on Pynini release 2.1.1. Here are some new features/improvements from the last few releases:

  • 2.0.9: Adds an efficient multi-argument union.
  • 2.0.9: Pynini (and the rest of OpenGrm) are available on Conda via Conda-Forge. This means that for most users, there is no longer any need to compile Pynini by hand; instead Pynini is compiled (for a variety of platforms) in the cloud, using a continuous integration framework.
  • 2.1.0: Rewrites the string compiler so that symbol tables are no longer attached to compiled FSTs, eliminating the need for expensive symbol table merging and relabeling options.
  • 2.1.0: Rewrites the FST and symbol table class hierarchies to better reflect the organization of lower-level APIs.
  • 2.1.1: Adds PEP 484/PEP 561-compatible type stubs.

We also have removed or renamed quite a few features:

  • stringify is renamed string.
  • text is renamed print (cf. the command-line tool fstprint).
  • The defaults struct is removed, though it may be reintroduced as a context manager at some point.
  • The * infix operator, previously used for composition is removed; use @ instead.
  • transducer‘s arguments input_token_type and output_token_type are merged as token_type.

Finally, we have broken Python 2.7 compatibility as of 2.1.0; pywrapfst, the lower-level API, still has some degree of Python 2.7 compatibility, but this is probably the last release to maintain that property.

Idealizations gone wild

Generative grammar and information theory are products of the US post-war defense science funding boom, and it is no surprise that the former attempted to incorporate insights from the latter. Many early ideas in generative phonology—segment structure and morpheme structure rules and constraints (Stanley 1967), the notion of the evaluation metric (Aspects, §6), early debates on opacities, conspiracies, and the alternation condition—are clearly influenced by information theory. It is interesting to note that as early as 1975, Morris Halle regarded his substantial efforts in this area to have been a failure.

In the 1950’s I spent considerable time and energy on attempts to apply concepts of information theory to phonology. In retrospect, these efforts appear to me to have come to naught. For instance, my elaborate computations of the information content in bits of the different phonemes of Russian (Cherry, Halle & Jakobson 1953) have been, as far as I know, of absolutely no use to anyone working on problems in linguistics. And today the same negative conclusion appears to be to be warranted about all my other efforts to make use of information theory in linguistics. (Halle 1975: 532)

Thus, the mania for information theory in early generative grammar—was exactly the sort of bandwagon effect of the sort Claude Shannon, the inventor of information theory, warned about decades earlier.

In the first place, workers in other fields should realize that the basic results of the subject are aimed at a very specific direction, a direction that is not necessarily relevant to such fields as psychology, economics, and other social sciences. (Shannon 1956)

Today, however, information theory is not exactly in disrepute in linguistics. First off, perplexity, a metric derived from information theory, is used as an intrinsic metric in certain natural language processing tasks, particularly language modeling.1 Secondly, there have been attempts to revive information theory notions as an explanatory factor in the study of phonology (e.g., Goldsmith & Riggle 2012) and human morphological processing (e.g., Moscoso del Prado Martı́n et al. 2004). And recently, Mollica & Piantadosi (2019; henceforth M&P) dare to use information theory to measure the size of the grammar of English.

M&P’s program is fundamentally one of idealization. Now, I don’t have any problem per se with idealization. Idealization is an important part of the epistemic process in science, one without which there can be no scientific observation at all. Critics of idealizations (and of idealization itself) are usually concerned with the things an idealization abstracts away from; for instance, critics of Chomsky’s famous “ideal speaker-listener” (Aspects, p. 3f) note correctly that it ignores bilingual interference, working memory limitations, and random errors. But idealizations are not merely the infinitude of variables they choose to ignore (and when the object of study is an enormously complex polysensory, multifactorial system like the human capacity for language, one is simply not going to be able to study the entire system all at once); they are just as much defined by the factors they foreground and the affordances they create, and the constraints they impose on scientific inquiry.

In this case, an information theoretic characterization of grammars constrains us to conceive of our knowledge of language in terms of probability distributions. This is a step I am often uncomfortable with. It is, for example, certainly possible to conceive of speakers’s lexical knowledge as a sort of probability distribution over lexical items, but I am not sure that P(word) has much grammatical work to do except act as a model of the readily apparent observation that more frequent words can be recalled and recognized more rapidly than rare words. To be sure, studies like the aforementioned one by Moscoso del Prado Martı́n et al. attempt to connect information theoretic characterizations of the lexicon to behavioral results, but these studies are correlational and provide little in the way of mechanistic-causal explanation.

However, for sake of argument, let us assume that the probabilistic characterization of grammatical knowledge is coherent. Why then should it be undertaken? M&P claim that the measurements they will allow—grammar sizes, measured in bits—weigh on an familiar debate. As they frame it:

…is the amount of information about language that is learned substantial (empiricism) or minimal (nativism)?

I don’t accept the terms of this debate. While I consider myself a nativist, I have formed no opinions about how many bits it takes to represent the grammar of English, which is by all accounts a rather complex object. The tradeoff between what is to be learned and what is innate is something that has been given extensive consideration in the nativist literature. Nativists recognize that the less there is to be learned, the more that has to have evolved in the rather short amount of time (in evolutionary terms) since we humans split off from our language-lacking primate cousins. But this tradeoff is strictly qualitative; were it possible to satisfactorily measure both evolutionary plausibility and grammar size, they would still be incommensurate quantities.

M&P proceed by computing the number of bits for various linguistic subsystems. They compute the information associated with phonemes (really, the acoustic cues to various features), the phonemic representation of wordforms, lexical semantics (mappings from words to meanings, here represented as a vector space as is the fashion), word frequency, and finally syntax. For each of these they provide lower bounds and upper bounds, though the upper bounds are in some cases constructed by adding an ad-hoc factor-of-two error to the lower bound. Finally, they sum these quantities, giving an estimate of roughly 1.5 megabytes. This M&P consider to be substantial. It is not at all clear why they feel this is the case, or how small a grammar would have to be to be “minimal”.

There is a lot to complain about in the details of M&P’s operationalizations. First, I am not certain that the systems they have identified are well-defined modules that would be recognizable to working linguists; for instance their phonemes module has next to nothing to do with my conception of phonological grammar. Secondly, it seems to me that by summing the bits needed to characterize each module, they are assuming a sort of “feed-forward”, non-interactive relationship between these components, and it is not clear that this is correct; for example, there are well-understood lexico-semantic constraints on verbs’ argument structure.

While I do not wish to go too far afield, it may be useful to consider in more detail their operationalization of syntax. For this module, they use a corpus of textbook example sentences, then compute the number of possible unlabeled binary branching trees that would cover each example. (This quantity is the same as the nth Catalan number.) To turn this into a probability, they assume that one correct parse has been sampled from a uniform distribution over all possible binary trees for the given sentence. First, this assumption of uniformitivity is completely unmotivated. Secondly, since they assume there’s exactly one possible bracketing, and do not provide labels to non-terminals, they have no way of representing the ambiguity of sentences like Call John an ambulance. (Thanks to Brooke Larson for suggesting this example.) Anyone familiar with syntax will have no problem finding gaping faults with this operationalization.2

M&P justify all this hastiness by comparing their work to the informal estimation approach known as a Fermi problem (they call them “Fermi calculations”). In the original framing, the quantity being estimated is the product of many terms, so assuming errors in estimation of each term are independent, the final estimate’s error is expected to grow logarithmically as the number of terms increases (roughly, this is because the logarithm of a product is equal to the sum of the logarithms of its terms). But in M&P’s case, the quantity being estimated is a sum, so the error will grow much faster, i.e., linearly as a function of the number of terms. Perhaps, as one reviewer writes, “you have to start somewhere”. But do we? If something is not worth doing well—and I would submit that measuring grammars, in all their richness, by comparing them to the storage capacity of obsolete magnetic storage media is one such thing—it seems to me to be not worth doing at all.

Footnotes

  1. Though not without criticism; in speech recognition, probably the most important application of language modeling, it is well-known that decreases in perplexity don’t necessarily give rise to decreases in word error rate.
  2. Why do M&P choose such a degenerate version of syntax? Because syntactic theory is “experimentally under-determined”, so they want to be “independent as possible from the specific syntactic formalism.”

References

Cherry, E. C., Halle, M., and Jakobson, R. 1953. Towards the logical description of languages in their phonemic aspect. Language 29(1): 34-46.
Chomsky, N. 1965. Aspects in the theory of syntax. Cambridge: MIT Press.
Goldsmith, J. and Riggle, J. 2012. Information theoretic approaches to phonology: the case of Finnish vowel harmony. Natural Language & Linguistic Theory 30(3): 859-896.
Halle, M. 1975. Confessio grammatici. Language 51(3): 525-535.
Mollica, F. and Piantadosi, S. P. 2019. Humans store about 1.5 megabytes of information during language acquisition. Royal Society Open Science 6: 181393.
Moscoso del Prado Martı́n, F., Kostić, A., and Baayen, R. H. 2004. Putting the bits together: an information theoretical perspective on morphological processing. Cognition 94(1): 1-18.
Shannon, C. E. 1956. The bandwagon. IRE Transactions on Information Theory 2(1): 3.
Stanley, R. 1967. Redundancy rules in phonology. Language 43(2): 393-436.

On the not-exactly-libfixes

In an early post I noted the existence of libfix-like elements where the newly liberated affix mirrors existing—though possibly semantically opaque—morphological boundaries. The example I gave was that of -giving, as in Spanksgiving and Friendsgiving. Clearly, this comes from Thanksgiving, which is etymologically (if not also synchronically) a compound of the plural noun Thanks and the gerund/progressive giving. It seems some morphological innovation has occurred because this gives rise to new coinages and the semantics of -giving is more circumscribed than the free stem giving: it necessarily refers to a harvest-time holiday, not merely to “giving”.

At the time I speculated that it was no accident that the morphological boundaries of the new libfix mimic those of the compound. Other examples I have since collected include mare (< nightmare; e.g., writemare, editmare); core (< hardcore; e.g., nerdcore, speedcore) and step (< two-step; e.g., breakstep, dubstep), both of which refer to musical genres (Zimmer & Carson 2012); gate (< Watergate; e.g., Climategate, Nipplegate, Troopergate) and stock (< Woodstock; e.g., Madstock, Calstock), extracted from familiar toponyms, and position (< exposition; e.g., sexposition, craposition), for which the most likely source can be analyzed as a Latinate “level 1” prefix attached to a bound stem. So, what do we think? Are these libfixes too? Does it matter that recutting mirrors the etymological—or even synchronic—segmentation of the source word?

References

B. Zimmer and C. E. Carson. 2012. Among the new words. American Speech 87(3): 350-368.

tfw it’s not prescriptivism

I think it would be nice to have a term that allowed us to distinguish between politely asking that we preserve existing useful lexical distinctions (such as between terrorism ‘non-state violence against civilians intended to delegitimize the state’ and terms like atrocities or war crimes, between selfie ‘photo self-portrait’ and photo portrait), and full-blown ideologically-driven prescriptivism. I do not have a proposal for what this term ought to be.

Libfix report for December 2019

A while ago I acquired a dictionary of English blends (Thurner 1993), and today I went through it looking for candidate libfixes I hadn’t yet recorded. Here are a few I found. From burlesque, we have lesque, used to form both boylesque and girlesque. The kumquat gives rise to quat. This is used in two (literal) hybrid fruits: citrangequat and limequat. From melancholy comes choly, used to build solemncholy ‘a solemn or serious mood’ and the unglossable lemoncholy. From safari there is fari, used to build seafarisurfari, and even snowfariDocumentary has given rise to mentary, as in mockumentary and rockumentary.

An interesting case is that of stache. While stache is a common clipping of mustache, it is commonly used as an affix as well, as in liquid-based beerstache and milkstache and the pejorative fuckstache and fuzzstache.

I also found a number of libfix-like elements that can plausibly be analyzed as affixes rather than cases of “liberation”. Some examples are eteer (blacketeer, stocketeer), legger (booklegger, meatlegger), and logue (duologue, pianologue, travelogue). I do not think these are properly defined as libfixes (they are a bit like -giving) but I could be wrong.

References

D. Thurner (1993). The Portmanteau Dictionary: Blend Words in the English Language, Including Trademarks and Brand Names. MacFarland & Co.

A theory of error analysis

Manual error analyses can help to identify the strengths and weaknesses of computational systems, ultimately suggesting future improvements and guiding development. However, they are often treated as an afterthought or neglected altogether. In three of my recent papers, we have been slowly developing what might be called a theory of error analysis. The systems evaluated include:

  • number normalization (Gorman & Sproat 2016); e.g., mapping 97000 onto quatre vingt dix sept mille,
  • inflection generation (Gorman et al. 2019); e.g., mapping pairs citation form and inflectional specification like (aufbauen, V;IND;PRS;2) onto inflected forms like baust auf, and
  • grapheme-to-phoneme conversion (Lee et al. 2020); e.g., mapping orthographic forms like almohadilla onto phonemic or phonetic forms like /almoaˈdiʎa/ and [almoaˈðiʎa].

While these are rather different types of problems, the systems all have one thing in common: they generate linguistic representations. I discern three major classes of error such systems might make.

  • Target errors are only apparent errors; they arise when the gold data, the data to be predicted, is linguistically incorrect. This is particularly likely to arise with crowd-sourced data though such errors are also present in professionally annotated resources.
  • Linguistic errors are caused by misapplication of independently attested linguistic behaviors to the wrong input representations.
    • In the case of number normalization, these include using the wrong agreement affixes in Russian numbers; e.g., nom.sg. *семьдесят миллион for gen.sg. семьдесят миллионов ‘nine hundred million’ (Gorman & Sproat 2016:516)
    • In inflection generation, these are what Gorman et al. 2019 call allomorphy errors; e.g., for instance, overapplying ablaut to the Dutch weak verb printen ‘to print’ to produce a preterite *pront instead of printte (Gorman et al. 2019:144).
    • In grapheme-to-phoneme conversion, these include failures to apply allophonic rules; e,g, in Korean, 익명 ‘anonymity’ is incorrectly transcribed as [ikmjʌ̹ŋ] instead of [iŋmjʌ̹ŋ], reflecting a failure to apply a rule of obstruent nasalization not indicated in the highly abstract hangul orthography (Lee et al. under review).
  • Silly errors are those errors which cannot be analyzed as either target errors or linguistic errors. These have long been noted as a feature of neural network models (e.g., Pinker & Prince 1988, Sproat 1992:216f. for discussion of *membled) and occur even with modern neural network models.

I propose that this tripartite distinction is a natural starting point when building an error taxonomy for many other language technology tasks, namely those that can be understood as generating linguistic sequences.

References

K. Gorman, A. D. McCarthy, R. Cotterell, E. Vylomova, M. Silfverberg, and M. Markowska (2019). Weird inflects but OK: making sense of morphological generation errors. In CoNLL, 140-151.
K. Gorman and R. Sproat (2016). Minimally supervised number normalization. Transactions of the Association for Computational Linguistics 4: 507-519.
J. L. Lee, L. F.E. Ashby, M. E. Garza, Y. Lee-Sikka, S. Miller, A. Wong, A. D. McCarthy, and K. Gorman (under review). Massively multilingual pronunciation mining with WikiPron.
S. Pinker and A. Prince (1988). On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition 28(1–2):73–193.
R. Sproat (1992). Morphology and computation. Cambridge: MIT Press.

Action, not ritual

It is achingly apparent that an overwhelming amount of research in speech and language technologies considers exactly one human language: English. This is done so unthinkingly that some researchers seem to see the use of English data (and only English) as obvious, so obvious as to require no comment. This is unfortunate in part because English is, typologically speaking, a bit of an outlier. For instance, it has uncommonly impoverished inflectional morphology, a particularly rigid word order, and rather large vowel inventory. It is not hard to imagine how lessons learned designing for—or evaluating on—English data might not generalize to the rest of the world’s languages. In an influential paper, Bender (2009) encourages researchers to be more explicit about the languages studied, and this, framed as an imperative, is has come to be called the Bender Rule.

This “rule”, and the aforementioned observations underlying it, have taken on an almost mythical interpretation. They can easily be seen as a ritual granting the authors a dispensation to continue their monolingual English research. But this is a mistake. English hegemony is not merely bad science, nor is it a mere scientific inconvenience—a threat to validity.

It is no accident of history that the scientific world is in some sense an English colony. Perhaps you live in a country that owes an enormous debt to a foreign bank, and the bankers are demanding cuts to social services or reduction of tariffs: then there’s an excellent chance the bankers’ first language is English and that your first language is something else. Or maybe, fleeing the chaos of austerity and intervention, you find yourself and your children in cages in a foreign land: chances are you in Yankee hands. And, it is no accident that the first large-scale treebank is a corpus of English rather than of Delaware or Nahuatl or Powhatan or even Spanish, nor that the entire boondoggle was paid for by the largest military apparatus the world has ever known.

Such material facts respond to just one thing: concrete actions. Rituals, indulgences, or dispensations will not do. We must not confuse the act of perceiving and naming the hegemon with the far more challenging act of actually combating it. It is tempting to see the material conditions dualistically, as a sin we can never fully cleanse ourselves of. But they are the past and a more equitable world is only to be found in the future, a future of our own creation. It is imperative that we—as a community of scientists—take  steps to build the future we want.

References

Bender, Emily M. 2009. Linguistically naïve != language independent: why NLP needs linguistic typology. In EACL Workshop on the Interaction Between Linguistics and Computational Linguistics, pages 26-32.

Is formal phonology in trouble?

I recently attended the 50th meeting of the North East Linguistics Society (NELS), which is not much of a society as a prestigious generative linguistics conference. In recognition of the golden jubilee, Paul Kiparsky gave a keynote in which he managed to reconstruct nearly all of the NELS 1 schedule, complete with at least one handout, from a talk by Anthony Kroch and Howard Lasnik. Back then, apparently, handouts were just examples: no prose.

In his talk, Paul showed a graph showing that phonology accounts for an increasingly small number of paper at NELS, and in fact the gap has actually gotten worse over the last few decades. Paul proposed something of an explanation: that the introduction of Optimality Theory (OT) and its rejection of “derivational” explanations has forever introduced a schism between phonology and other subareas, and that syntacticians and semanticists are simply uncomfortable with the non-derivational nature of modern phonological theorizing.

With all due respect, I do not find this explanation probable. As he admits, most OT theorizing (including his own) now actually rejects the earlier rejection of derivational explanations. And on the other hand, modern syntactic theories are a heady brew of derivational (phases, copy theory, etc.) and non-derivational (move α, uninterpretable feature matching, etc.) thinking. And finally it’s not really clear why the aesthetic preferences of syntacticians (if that’s all they are) should produce the data, i.e., fewer phonology papers at NELS.

But I do agree that OT is the elephant in the room, responsible for an enormous amount of fragmentation in phonological theorizing.

I would liken Prince & Smolensky’s “founding document” (1993) to Martin Luther’s Ninety-five Theses. Scholars believe that Luther wished to start a scholarly theological debate rather than a popular revolution, and I suspect the founders of OT were similarly surprised with the enormous impact their proposal had on the field. Luther’s magnificient heresy may have failed to move the Church in the directions he wished, but he is the father of hundreds if not thousands of Protestant sects, each with their own new and vibrant “heresies”. The founders of OT, I think, are similarly unable to put the cat back into the bag (if they wish to at all).

In my opinion, OT’s early rejection of derivationalism has been an enormous empirical failure, and the full-blown functionalistic-externalist thinking—one of the first post-OT heresies (let’s liken it to Calvinism)—is, in my opinion, ontologically incoherent. That said, I would encourage OT believers to try more theory-comparison. The article on “Christian denominations” in Diderot’s & d’Alembert’s Encyclopédie begins with the obviously insincere suggestion that someone ought to study which of the various Protestant sects is most likely to lead to salvation. But I would sincerely love to find out which variant of OT is in fact most optimal.

[Thanks to Charles Reiss for discussion.]

Should Noam Chomsky retire?

Somebody said he should. I don’t want to put them on blast. I don’t know who they are, really. Their bio says they’re faculty at a public university in the States, so they probably know how things go around here about as well as me. Why should he retire? They suggested that were he to retire his position at the University of Arizona, that it would open up a tenure line for “ECRs”.1

Let me begin by saying I do not have a particularly strong emotional connection to Noam. Like many linguists, my academic family tree has many roots at MIT, where Noam taught until quite recently. I have met him in person once or twice, and I found him polite and unassuming. This is a surprise to me. The Times once wrote that Noam is “arguably the most important intellectual alive today”, and important people are mostly assholes.

But I do have very strong intellectual commitments to Noam’s ideas. I think that the first chapter of his Aspects of the Theory of Syntax (1965) is the best statement of the problem of language acquisition. I believe that those who have taken issue with the Aspects idealization of the “ideal speaker-listener” betray a profound ignorance of the role that idealizations play in the history of science.

I think The Sound Pattern of English (SPE), which Noam cowrote with Morris Halle, is the most important work in the theory of phonology and morphology. I believe that the critics who took issue with the “abstract” and “decompositional” nature of SPE have largely been proven wrong.

I even admire the so-called “minimalist program” for syntactic theory Noam has outlined since the 1990s.

It is impossible to deny Noam’s influence on linguistics and cognitive science. We who study language are all pro- or anti-Chomskyians, for better or worse. (And I have much more respect for the “true haters” than the reflexive anti-Chomskyians.) I don’t think Noam should apologize for his critiques of “usage-based” linguistics. I don’t think Noam can fairly be called an “arm-chair” theorist. I think generative grammar has made untold contributions to even areas like language documentation and sociolinguistics, which might seem to be excluded by a strict reading of Aspects.

And, I admire Noam’s outspoken critique of US imperialism. While Noam may have some critics from the left, his detractors (including many scientists of language!) are loud defenders of the West’s blood-soaked imperial adventures.

As a colleague said: “I like Noam Chomsky. I think his theories are interesting, and he seems like a decent guy.” He is a great example of what one can, and ought to, do with tenure.

None of this really matters, though. I do not think he “deserves” a job any more than any other academic does. So, could Noam clear up a “tenure line” simply by retiring? The answer is probably not. Please allow me an anecdote, one that will be familiar to many of you. I teach in a rather-large and robust graduate linguistics program at a publicly-funded college in one of the richest cities in the world (“at the end of history”). Two of our senior faculty are retiring this year, and as of yet the administration has not approved our request to begin a search for a replacement for either of them. Declining to replace tenure lines after retirement is one of the primary mechanisms of casualization in the academy.

Even if you disagree with my assessment of Noam’s legacy, the availability of tenure is not directly conditioned on retirements (though perhaps it should be). Noam bears no moral burden for simply not retiring. If you’d like to fight back against casualization of labor, take the fight to the administration (and to the state houses who set the budgets), don’t blame senior faculty for simply continuing to exist in the system.

PS: If you enjoyed this, you should read The Responsibility of Intellectuals.

1: I had to look up this acronym. It stands for “early-career researchers”, though I’m not quite sure when one’s “early career” starts or ends. I find that an unfortunate ambiguity.

Latin vowel-glide alternations

Post-war structuralist phonology greatly emphasized phonemics and largely ignored morphophonemics. But in 1959, Morris Halle’s Sound Pattern of Russian argued that the distinction between allophony and alternation has little cognitive importance, and in fact the distinction leads to an unnecessary duplication of effort. As a result of Halle’s forceful arguments, the contrast between phonemic and morphophonemic processes plays little role in modern phonological theory. I would like to go one step further and suggest that patterns of alternation are actually more principled facts than those of allophony. Simply put, a speaker must command the pattern of alternation in their language; but it is not at all clear whether they exploit allophony when constructing their lexical entries. This is highlighted most clearly by the notions of lexicon optimization, Stampean occultation, and richness of the base in Optimality Theory, though as Hale et al. (1998) note, similar points apply to rule-based theories.

In writing the Romans did not draw distinctions between the high monophthongs [i, u, iː, uː] and glides [j, w], respectively. This naturally led structuralist linguists (e.g., Hall 1946) to suggest that the glides are allophones of the high monophthongs. There are some apparent problems with this suggestion, though not all of them are fatal. One point that has largely been ignored in this discussion is that Classical Latin has at least four types of plausible alternations between high monophthongs and the corresponding glides. In this squib I review these alternations.

Deverbal -u- derivatives

There are a large number of adjectival derivatives formed from verbal stems by the addition of -u- and the appropriate agreement suffixes, e.g., masculine nominative singular (masc. nom.sg.) -u-us, feminine nom.sg. -u-a, and neuter nom.sg. -u-um, and so on. These derivatives have a similar semantics to past participles (“having been Xed”) but in some cases have a secondary meaning “able to be Xed”. For example, the masc. nom.sg. form dīuiduus [diːwi.du.us] means ‘divided’ (cf. dīuidō [diːwi.doː] ‘I divide’) but also ‘divisible’. This is a fairly productive process, as the following examples show. (I have taken the liberty of leaving off certain further productive derivatives, such as intensified adjectives in per-.)

(1) assiduus ‘constant, ambiguus ‘hither and thither’, annuus ‘annual, arduus ‘elevated’, cernuus ‘bowed forward’, circumfluus ‘flowing around’ (refluus ‘ebbing’), cōnspicuus ‘visible’, contiguus ‘neighboring’, continuus ‘continuous’, dīuiduus ‘divided; divisible’ (indīuiduus ‘undivided; indivisible’), exiguus ‘strict’, fatuus ‘foolish’, incaeduus ‘uncut’,  ingenuus ‘indigenous’, irriguus ‘irrigated’, mēnstruus ‘monthly’, mortuus ‘dead’ (dēmortuus ‘departed’, intermortuus ‘decayed’, praemortuus ‘prematurely dead’), mūtuus ‘borrowed’ (prōmūtuus ‘paid in advance’), nocuus ‘harmful’ (innocuus ‘harmless’), occiduus ‘westerly’, pāscuus ‘for pasturing’, perpetuus ‘perpetual’, perspicuus ‘transparent’, praecipuus ‘particular’, prōmiscuus ‘indiscriminate’, residuus ‘remaining’,  riguus ‘irrigated’, strēnuus ‘brisk’, succiduus ‘sinking’, superuacuus ‘superfluous’, uacuus ’empty’, uiduus ‘destitute’

In all the above cases …uus is read [u.us]. However, when the stem ends in a liquid [l, r] …uus is read [wus], indicating that the deadjectival affix is realized as [w].

(2)
a. caluus ‘bald’, fuluus ‘reddish-yellow, tawny’, giluus ‘pale yellow’, heluus ‘honey yellow’
b. aruus ‘arable’, curuus ‘bent’ (incuruus ‘bent’), furuus ‘dark, swarthy’, paruus ‘small’, prōteruus ‘violent’, toruus ‘savage’

It is interesting to note that the contexts where -u- is realized as [w] align with a well-known allophonic generalization (Devine & Stephens 1977: 61., 134f.): a u preceded by a (tautomorphemic) coda liquid or front glide, and followed by a vowel, is realized as [w], as in silua [sil.wa] ‘forest’ or ceruus [ker.wus] ‘deer’, but is realized as a vowel when the preceding consonant is either a nasal, an obstruent, or part of a consonant cluster, as in lituus [li.tu.us] ‘trumpet’ or patruus [pa.tru.us] ‘paternal uncle’.

Two residual issues remain. First, when the verbal stem end in qu [kw], the adjectival derivative is spelled …quus. By the normal rules of spelling this would be read as [kwus], which would suggest that a zero allomorph of the adjectival suffix is selected for here.

(3) aequus ‘equal’, antīquus ‘old’, fallāciloquus ‘falsely speaking’ (fātiloquus ‘prophetic’, flexiloquus ‘ambiguous’, grandiloquus ‘grandiloquent’, magniloquus ‘boastful’, uāniloquus ‘lying’, uersūtiloquus ‘slyly speaking’), inīquus ‘unjust’, longinquus ‘distant’, oblīquus ‘slanting, oblique’, pedisequus ‘following on foot’, propinquus ‘near’, reliquus ‘remaining’

This is consistent with the metrical evidence. For instance in the following verse, aequus must be read as bisyllabic.

(4)
hoc opus hic labor est paucī quōs
aequus amāuit (Verg., Aen. 6.129)[ok.ko.pu|sik.la.bo|rest.paw|kiː.kwoː|saj.kwu.sa|maːwit]

Secondly, there are a number of deverbal derivatives in -u-us where the verb form also has a stem-final [w]. In this case we also observe [wus].

(5)
a. cauus [ka.wus] ‘hollowed; hollow’ (concauus ‘hollow’); cf. cauō [ka.woː] ‘I excavate’
b. flāuus [flaː.wus] ‘yellow, gold, blonde’ (sufflāuus ‘yellowish’); cf. flāueō [flaː.we.oː] ‘I am yellow’
c. (g)nāuus [naː.wus] ‘active’ (īgnāuus ‘lazy’); cf. nāuō [naː.woː] ‘I do s.t. enthusiastically’
d. nouus [no.wus] ‘new’; cf. nouō [no.woː] ‘I renew’
e. saluus [sal.wus] ‘safe; well’; cf. salueō [sal.we.oː] ‘I am well’
f. uīuus [wːi.wus] ‘living’ (rediuīuus ‘restored to life’); cf. uīuō [wiː.woː] ‘I live’

This may be another context where the adjectival suffix has a zero allomorph, though it is not clear whether we are looking at the same derivational process as above.

The foregoing discussion leads me to posit a deverbal adjective-forming suffix /-u-/ with two phonologically-predictable allomorphs: [w] before liquids, and zero before [kw] and possibly, [w].

The “third stem”

Schoolchildren learning Latin memorize four forms (or principal parts) of each verb: the first person singular (1sg.) present active indicative (e.g., amō ‘I love’), the present infinitive (amāre ‘to love’), the 1sg. perfect active (amāvī ‘I loved’), and the perfect passive participle (amātus masc. nom.sg. ‘loved). The first two principal parts effectively index the so-called “present stem” of the verb, and the third principal part gives the so-called “perfect stem”. The relationship between the present and perfect stem is often unpredictable. Some perfect stems lengthen a monophthong in the final syllable of the present stem (e.g., legō/lē‘I choose/chose’); some perfect stems omit a post-vocalic nasal in the final syllablem with comcomitant lengthening (uincō/uī ‘I win/won’); some are mutated by the addition of a -s- perfect suffix (cō/dīxī [diː.koː, diːk.siː] ‘I say/said’); others bear a CV-reduplication prefix, and so on. This has lead some to suggest that the latter two stems are essentially “listed” or “stored” for all verbs. This is, for instance, the position of Lieber (1980:141f., 152f.), but has been disputed by Aronoff (1994: chap. 2) and Steriade (2012), among others, who claim there are many productive regularities in both cases.

The majority of verbs have perfects that consist of the bare verb root, the theme vowel, a high back vocoid perfect suffix, and the appropriate person-number agreement suffixes (e.g., 1sg. -ī-). The perfect suffix is preceded by a theme vowel and as the appropriate agreement suffixes are all vowel-initial, it is always intervocalic. Allophonically, this is a context where [u] is never found but [w] is, and this is what we find here: amāuī [a.maː.wiː] ‘I loved’. This type of perfect is in fact found in all conjugations, and found in the overwhelming majority of 1st (-ā- theme vowel) and 4th conjugation (-ī-) verbs (Aronoff 1994:43f.).

(6)
a. cōnsōlāuī [kon.soː.laː.wiː], portāuī [por.taː.wiː] ‘I carried’
b. dēlēuī [deː.leː.wiː] ‘I destroyed’, plēuī [pleː.wiː] ‘I filled up’
c. cupīuī [ku.piː.wiː] ‘I desired’, petīuī [pe.tiː.wiː] ‘I sought’
d. audīuī [aw.diː.wiː] ‘I listened to’, mūnīuī [muː.niː.wiː] ‘I fortified’

However, there is an alternative formulation in which the theme vowel is omitted,  placing the perfect suffix to the right of a consonant, and in this context it is instead realized as [u]. This type of perfect is also found in all conjugations but is most common in the 2nd (-ē-) conjugation.

(7)
a. domuī [do.mu.iː] ‘I tamed’, uetuī [we.tu.iː] ‘I forbid’
b. docuī [do.ku.iː] ‘I taught’, tenuī [te.nu.iː] ‘I held’
c. rapuī [rap.u.iː] ‘I snatched’, texuī [tek.su.iː] ‘I wove’
d. aperuī [a.pe.ru.iː] ‘I opened’, saluī [sa.lu.iː] ‘I leapt’

Together the patterns in (6-7) account for the vast majority of perfects in all conjugations except the 3rd (itself a grab-bag of etymologically dissimilar verbs).

I propose that the default perfect suffix is /-u-/ and that it undergoes glide formation to [w] in (6), in intervocalic position, a generalization consistent with the allophonic facts. In (7), when adjacent to the verb root, glide formation is blocked. However, the examples in (7) cannot take a “free ride” on any allophonic generalization. As can be seen in (7d), the perfect suffix does not form [l.w, r.w] syllable contact clusters, unlike the adjectival suffix in (5). There is a surfeit of possible analyses for the failure of glide formation in this context: it might be an effect specific to the perfect suffix or to the category of verb, or the result of cyclicity or phase-based spellout. We leave the question open for now.

The “fourth stem”

The form of the perfect passive participle, the fourth principal part, similarly problematic. For many verbs, the perfect passive participle is formed by adding to the verb root a -t- suffix and the appropriate agreement suffixes (e.g., in citation form, the masc. nom.sg. -us), once again sometimes accompanied by lengthening of the stem-final vowel and/or leftward voice assimilation (an exception-less rule of Latin) triggered by the -t- as in (8b).

(8)
a. docuī [do.ku.iː] ‘I teach’, doctus [dok.tus] masc. nom.sg ‘taught’
b. tegō [te.goː] ‘I clothe’, tēctus [tek.tus] masc. nom.sg. ‘clothed’

Two verb roots which end in consonant followed by a high back vocoid and form a -t- perfect passive participle: soluō [solwoː] ‘I loosen; I explain’ and uoluō [wolwoː] ‘I roll’. This places the root-final high back vocoid, by hypothesis /u/, between two consonants, a context where glides are forbidden. The result is solūtus [soluːtus] and uolūtus [woluːtus]. However, it should be noted that this particular pattern is limited to these two verbs and their derivatives, and that the long ū is unexpected unless it reflects stem vowel lengthening (cf. tēctus above).

Synizesis and diaeresis

Latin poetry exhibits variation in glide formation. (The following examples are all drawn from Lehmann 2005). Synizesis, the unexpected overapplication of glide formation in response to the meter, can be seen in the following verse.

(9)
tenuis
ubī argilla et dūmōsīs calculus aruīs
(Verg., G. 2.180)
[ten.wi.su|biːr.gil|let.duː|mōsīs|kal.ku.lu|sar.wiːs]

In this verse, tenuis ‘thin’ occurs initially, which requires that the first syllable be heavy. The only way to accomplish this is to read it as the bisyllabic [ten.wis] rather than the expected trisyllabic [te.nu.is]. Similarly, in another verse (Verg., Aen. 8.599), abiēte, the ablative singular of abiēs ‘silver fir’, must be read as trisyllabic [ab.jeː.te] rather than the expected [ab.i.eː.te].

On the other hand, the poets also make use of diaeresis, or apparent underapplication of glide formation. For example, siluae, the genitive singular of silua ‘forest’, is in one verse (Hor., Carm. 1.23.4) read as trisyllabic [si.lu.aj] rather than as the expected bisyllabic [sil.waj]. The conditions governing synizesis and diaeresis are not yet well understood, but they constitute further evidence for the close grammatical relationship between [i ~ j] and [u ~ w] in Classical Latin.

Conclusion

We have seen four ways in which the Latin high vocoids alternate between vowels and glides. Together, these four patterns provide indirect evidence for the hypothesis that Latin glides are allophones of the corresponding high vowels, though there are some minor dissociations between patterns of allophony and alternations.

[Earlier writing about Latin glides: Latin glides and the case of “belua”]

References

Aronoff, Mark. 1994. Morphology by itself: stems and inflectional classes. Cambridge: MIT Press.
Devine, Andrew M., and Stephens, Laurence D. 1977. Two studies in Latin phonology. Saratoga: Anma Libri.
Hall, Robert A. 1946. Classical Latin noun inflection. Classical Philology 41(2): 84-90.
Hale, Mark and Kissock, Madelyn, and Reiss, Charles. 1998. Output-output correspondence in Optimality Theory. In Proceedings of WCCFL, pages 223-236.
Halle, Morris. 1959. The sound pattern of Russian. The Hague: Mouton.
Lehmann, Christian. 2005. La structure de la syllabe latine. In Touratier, Christian (ed.), Essais de phonologie latine, pages 157-206. Aix-en-Provence: Publications de l’Université de Provence.
Lieber, Rochelle. 1980. On the organization of the lexicon. Doctoral dissertation, MIT.
Steriade, Donca. 2012. The cycle without containment: Latin perfect stems. Ms., MIT.