Actually, chess has a tech tree

A few years ago, Elon posted (it’s a real tweet; just screenshotting for posterity) that Chess doesn’t have a “tech tree”:

see full tweet at https://x.com/elonmusk/status/1841521084559945980?lang=en

I disagree. First, there’s promotion of pawns. Then, there is castling, which moves the king from a center file to relative safety on the side, at the same time moving a rook into the center and a more active role. And of course we speak of developing one’s knights (who could be likened to indirect fire), bishops (who provide enfillade fire), and rooks by moving them from their starting position into ones where they can more actively attack and defend. the middle of the board. There are even various systems for scoring board position based on piece development. If this isn’t a tech tree, I don’t know what is.

The lexical/postlexical distinction in Logical Phonology

According to an old idea developed most carefully by Kiparsky (1982) in the framework of Lexical Phonology there is a fundamental distinction between lexical and postlexical phonological computation, with the former necessarily applying before the latter. The following distinctions are proposed:

(1) Lexical processes must (or may) be cyclic, reapplying after every word-formation process; postlexical processes cannot be.
(2) Lexical processes are usually feature-filling (or structure-building), and must be so when applying in non-derived environments (i.e., on the first cycle); postlexical processes may be either feature-filling or feature-changing (or structure-changing).
(3) Lexical processes may have morphemic/lexical exceptions; postlexical processes never do.

I suggest the empirical effects of (2-3) follow more or less directly from the assumptions of Logical Phonology (see especially Gorman & Reiss in press a). This is perhaps not surprising—Logical Phonology is influenced by certain strands of Lexical Phonology. In Lexical Phonology, the lexical/postlexical distinction, its connection to cyclicity as in (1), and its connection to exceptionality as in (3) are all axiomatic (i.e., stipulated). Logical Phonology, in contrast, does not recognize the lexical/postlexical distinction, and it similarly treats the feature-filling/feature-changing distinction (and the related non-derived environment blocking; see Gorman and Reiss in press b and Reiss 2025), as in (2), as derived rather than axiomatic. Yet, Logical Phonology is largely capable of deriving the empirical effects of (2-3). The relevant assumptions, justified in various places in the Logical Phonology canon, are as follows:

(4) Underspecification: Underspecification is permitted.
(5) Epistemic boundedness: Elements which appear to be identical on the surface but behave differently with respect to the morphophonology (i.e., which constitute putative exceptions) are underlyingly distinct. Where appropriate, underspecification is deployed to encode underlying distinctions so posited.1
(6) Specificity: Suppose that segment /G/ is more richly specified than underspecified /F/, but they agree on all segments for which they are both specified (i.e., /F/ ⊂ /G/). Then, it is impossible for a phonological rule to intensionally target (or be triggered by) /F/ without also being targeted (or triggered by, resp.) /G/.
(7) Theory of possible rules: Intrasegmental phonological processes derive from either unification or subtraction rules.2

Let’s get (1) out of the way first. It is not clear whether this claim can be maintained: various linguists, starting with Booij and Rubach (1987), claim there are non-cyclic rules which show other symptoms of being lexical rules, including properties (2-3). Such rules have been called postcyclic. A weaker version of this, then, is simply the claim that all cyclic rules precede all non-cyclic rules, a claim which no longer ties cyclicity to any notions specific to Lexical Phonology. Logical Phonology does not have much to say about this. Logical Phonology is fully compatible with cyclicity—indeed, cyclicity is used to excellent effect in some unpublished work by Daniar Kasenov and Charles Reiss—but has little new to say about the notion.

Logical Phonology has much more to say about (2-3). Logical Phonology proposes that feature-changing (i.e., structure-changing) processes reflect subtraction rules feeding unification rules; there are no feature-changing rules per se. Thus, for a unification rule R to mutate (i.e., non-vacuously target) some segment s, it must be the case that either:

(8) Conditions on non-vacuous unification:
a. is underlyingly underspecified with respect to one or more features specifications in the change (right-hand-side) portion of R, or
b. s is made to be underspecified w.r.t. those feature specifications via a subtraction rule earlier in the derivation.

Because Logical Phonology permits one to mix subtraction and unification rules more or less freely, derivations are not necessarily monotonic in the sense of only adding feature specifications. However, one can discern a weaker form of monotonicity (which I’ll tentatively call monotonicity of “exceptionality”) in these derivations. With respect to features that are underlyingly underspecified, as in (8a), the only thing a Logical Phonology derivation can “do” to those underspecified “slots,” speaking informally, is fill them in. As a corollary of (6), no rule can refer to the absence of features on these segments. Once such segments are no longer underspecified (due to unification), there is no way for subsequent rules to refer to their underlyingly underspecified status. Consequently, the ability for rules to interact with the underlying underspecification of segments decreases monotonically as the derivation progresses.

Logical Phonology grammars make extensive use of underlying underspecification, and the fact that a segment was underlyingly underspecified becomes less “useful” (again speaking informally) as the derivation progresses. This alone seems sufficient to predict a weak tendency for earlier rules in the derivation to be feature-filling unification, whereas later rules mix subtraction and unification to derive feature-changing processes, as in (2). And, it also suffices to predict that derivationally-earlier processes are more likely to show the effects of “exceptionality”, simply because such putative exceptionality is often encoded by underlying underspecification that becomes increasingly difficult for rules to refer to, as in (3).

Above, I have stated this as a tendency, but I suspect there may be some unanticipated “escape hatches” from these predictions, via some complex series of rules I have not yet anticipated, on analogy to the wandering targets proscribed in Gorman & Reiss (2025). If that is the case, I’d plead much as we do in that paper: the rules needed to implement these escape hatches may be highly unlikely to make it through the diachronic filter, or may be structurally excluded by non-trivial protocols of the language acquisition device.

Endnotes

  1. This is surely too strong, but I’m making a point here.
  2. A third type of rule, segment rules, are formalized in Gorman and Reiss in press b.

References

Booij, Geert and Rubach, Jerzy. 1987. Postcyclic versus postlexical rules in Lexical Phonology. Linguistic Inquiry 18: 1-44.
Gorman, Kyle and Reiss, Charles. 2025. How not to acquire exchange rules in Logical Phonology. In Proceedings of the 2025 annual conference of the Canadian Linguistic Association.
Gorman, Kyle and Reiss, Charles. In press a. Metaphony in Substance-Free Logical PhonologyPhonology in press.
Gorman, Kyle and Reiss, Charles. In press b. Natural class reasoning in segment deletion rules. Paper presented at the 56th Annual Meeting of the North East Linguistic Society, to appear in the proceedings.
Kiparsky, Paul. 1982. Lexical Phonology and morphology. In I.-S. Yang (ed.), Linguistics in the Morning Calm, pages 3-91. Hanshin.
Reiss, Charles. 2025. Specificity and “non-derived environment blocking” in Logical Phonology. In Proceedings of the 2025 annual conference of the Canadian Linguistic Association.

You can just do things: walking to Philadelphia

So I walked to Philadelphia, from Brooklyn. It took three days in all. I was mostly curious whether it could be done, and it can.

I did it in three days. I knew from prior experience that I can log 20-25 miles of walking in a day without any major problems, so I figured I could do a bit more, could probably do it three days in a row, and could probably do it with a backpack of under 20 pounds.

On the first and longest day, I started early and walked from home up Flatbush Ave., turning at Tillary to catch the pedestrian path over the Brooklyn Bridge. I picked up a bagel and then a ferry to Belford, NJ. I had read about people taking the only pedestrian bridge to New Jersey—the George Washington Bridge out of Washington Heights—but getting that far north and then coming back down south through the Palisades, Newark and such adds at least another day to the process. The ferry terminal in Belford is a curious beast: it is in the middle of nowhere but it was full of young professionals in nice clothing heading to work. I walked for a few hours through a mix of “nowhere” and some nice-looking suburbs until I hit a diner, where I paused for lunch. I then continued west, which was mostly exurban, until I hit I-95 and the eastern edge of the borough of Cranbury, where there is a small cluster of motels. The last few hours were not easy on my morale: rain, which wasn’t really in the forecast, started and then became more intense until there was even some distant thunder. I packed rain gear, but it wasn’t easy on my morale, and there were a few places where I was walking along the shoulder of a relatively busy road. I checked into the motel, showered, consumed water and electrolytes to relieve my spasming calves, and ate a hotel lobby Cobb salad, then went to bed. Day 1: 30 miles.

The next day, I stretched (and took a dip in the motel pool), surrounded the worst blister with moleskin, and headed out. I stopped in the very cute town of Cranbury for brunch at a diner. From there, it was a few miles of office parks, some farmland, and then nice suburbs where I had some shade. Eventually, I got to the outskirts of Hamilton, an urban area near Trenton, and then walked through the south side of Trenton itself, crossing over into Pennsylvania on the bridge with the “TRENTON MAKES THE WORLD TAKES” sign. At that point, my legs were pretty tired, but I still had a long way to go and was running out of daylight, having started later than planned. The next stretch was almost all on the Delaware & Lehigh towpath, which still exists even though the canal has been covered over with various causeways for roads and highways. Parts held water, others were just muddy banks, but the whole thing was a pretty nice path, mostly forested. At some point, there was no more light (I’d started out later than planned), but the path was clear and nobody else was around. Eventually, this dead-ended into the Bristol Pike, where I stopped at the second motel. I ended up just eating some snacks and water before bed. Day 2: 27 miles.

I started off with a healthy breakfast (they exist) at Wawa, and then continued down the Pike until I hit Philadelphia city limits. Philadelphia managed to annex its northeastern suburbs in the mid-19th century and so it stretches many miles to the northeast of Center City, and the Pike becomes Frankford Ave. This part was the only portion that had any hills to speak of. After another Wawa meal (this one less healthy) I continued into Kensington and Port Richmond, as the temperature and humidity rose. These neighborhoods have never been wealthy to my knowledge, and they are undeclared drug amnesty zones, and there was a really sad amount of human suffering on display; it’s hard to look at. At this point, I was really in a lot of pain between my tight hips, blisters, and my arches, and I needed drinking water to cope with the heat, but there was simply nowhere I felt comfortable stopping for several miles, until I finally hit the edge of Fishtown. Since I was still on schedule, I stopped to rest for about an hour at Atlantis, The Lost Bar, probably the northeastern-most bar I used to go to when I lived in Philly, and then from there, continued down Frankford Ave. until Girard.  From there, I zigzagged a bit to Chinatown. There, I got the shaved noodles at Nan Zhou (highly recommended), took a photo at City Hall, and then finally arrived at 30th St. Station, where I caught the Keystone Line train back to New York. Day 3: 23 miles.

Pronouncing Mamdani

Throughout the primary and general election for the New York City mayor, Andrew Cuomo, among others, struggled with pronouncing the last name of the ultimate winner, Zohran Mamdani, with Cuomo repeatedly rendering it as what sounds like [mɑndɑni], with an unexpected [n] in the coda of the first syllable. While this error quite possibly reflects Cuomo’s apparent disinterest in other people, there is an obvious phonological basis for it. In English, there is a process by which coda nasals take on the place of a following obstruent. This can be seen (e.g., Gorman 2013:75f.) in a few potential alternations: e.g., many theorists derive English [ŋ] from underlying /ng/, and the Latinate negative prefix in- as in i[n.d]ecent has an allomorph im– as in i[m.b]alance. It is also overwhelmingly true of monomorphemic words, with words like pi[m.p]le, sta[n.z]a, or mo[ŋ.k]ey. Of course there are a few exceptions, like pli[m.s]oll and scri[m.ʃ]aw, but as I show in my dissertation, they are quite rare in my sample of 6,619  English monomorphemic words. There are just two examples of [m.d] that CELEX considers monomorphemic: du[m.d]um and hu[m.d]rum. Of course, CELEX is wrong on both counts: both are reduplicative and the [m.d] cluster occurs at the boundary between base and reduplicant.

All of this is a long way to say that it’s likely that Cuomo and others are phonotactically “adapting” Mamdani’s name to the native English pattern. Of course, we don’t normally do that with “non-Anglo” names in English; we tend to render them as faithfully so long as they consist of segments present in the native inventory, modulo unusually complex consonant clusters.

References

Gorman, K. 2013. Generative phonotactics. Doctoral dissertation, University of Pennsylvania.

Natural class reasoning in segment deletion rules

I posted our handout from our NELS talk yesterday here. We illustrate two points: a corrolary of Logical Phonology (LP) called delete the rich, pertaining to segment deletion rules, and how LP handles apparent cases of non-derived environment blocking. In doing so, we give a relatively detailed phonology of Hungarian h and also address the famous case of Turkish velar deletion.

I’ll post the MS for the proceedings to LingBuzz once it’s ready.

Metaphony in Logical Phonology

My paper with Charles Reiss on metaphony in Logical Phonology is now accepted to appear in a special issue of Phonology. As it happens, it includes problems I originally posed here on this blog (1 2 3). I have also updated the version on LingBuzz to include various changes recommended by the reviewers and editors.

The linking constraint and exhaustification

Hayes (1986) proposes the linking constraint, a convention for the interpretation of autosegmental rules. As stated, it holds that association lines should be “interpreted as exhaustive”. In the context of a rule, this means that the target and triggers are not permitted to have additional linkages not mentioned in the rule.

Later in the paper, Hayes makes it clear that this is to be interpreted with respect to whatever tiers are mentioned. For example, imagine a rule that manipulates the melodic/featural tier but is conditioned in part by the CV tier—Hayes adopts CV theory, but the “constraint” is just as applicable to approaches which use an X and/or moraic tier—then the rule does not apply to any susbstring of the melody whose melodies contain associations to the CV tier not mentioned. Similarly, imagine a rule that targets elements on the CV tier but is conditioned in part by the melody: such a rule would not apply to any substring of the CV tier with associations to the melody not explicitly stated in the rule.

I would like to claim that this is all too informal. It should be possible to state the substring that matches the rule using something like first-order logic (FOL), and similarly to translate the change into a logical statement. However, it’s not immediately clear how to write the procedure that translates autosegmental diagrams into the appropriate FOL sentences. (I put aside the encoding of the change: I think this will be comparatively easy.) Autosegmental diagrams itself are essentially an fragment of undirected graph, and translating these into FOL statements is straightforward enough: the description of the graph is defined by the logical conjunction of:

  • one-place predicates stating what type each element is (i.e., what tier its on),
  • two-place immediate-precedence predicates (when the rule refers to  multiple elements on a given tier),
  • two-place (unordered) predicates indicating the association lines between tiers.

To enforce the linking constraint, one needs to add additional predicates to this conjunction that rule out associations not mentioned. Conceptually, I think of this as an exhaustification function (with apologies for the abuse of terminology) which takes the graph description above and returns the predicates needed to rule out forbidden associations in the relevant subgraphs. While I think I know what these exhaustifying predicates need to be for toy examples, I don’t yet know what the general algorithm is.

I am also somewhat concerned whether phonologists are (were?) applying the linking convention in a vibes-based fashion depending on the example in question, in which case no algorithm could properly describe analytical practices. Finally, I am interested in the feasibility of the opposite approach—a road not taken, as far as I know, in autosegmental theory— whereby undesired associations are ruled out simply by making the rule more explicit.

Does anyone know of any relevant work on this topic? Surely I am not the first person to be bothered by this.

References

Hayes, B. 1986. Inalterability in CV phonology. Language 62: 321-351.

More imperative-dominant defectivity in English

Aidan Malanoski adds the following to our list of imperative-dominant defective verbs in English: come V, go V. These seem (to Aidan and I) to show a similar distribution to the ones discussed in the previous post: imperatives are ok (Come say hi! Go give your mom a kiss!) as are infinitives (She shouted for him to come greet me.) and everything else is degraded.

The Zipf scale

Word frequency norms are usually computed by counting word frequencies in some large, relatively-diverse, but hopefully representative corpus. However, raw frequency is only interpretable relative to the size of that corpus.

Converting raw frequencies to probabilities (i.e., via maximum likelihood estimation) removes this corpus-dependence, but the resulting probabilities are themselves not terribly interpretable either. One slight improvement has been to interpret them as words per million (usually rounding to the nearest power of ten). It is reasonably obvious to me that “100 wpm” is an improvement on “.0001” or the equivalent “1e-4”.

Van Heuven et al. (2014) propose a variation on words-per-million metrics which they call the Zipf scale. While van Heuven et al. do not give a complete formula, examples indicate that their scale is equivalent to wpm + 2, and can be computed from raw frequencies as $\log_{10}(c) – \log_{10}(N) + 9$ when raw frequency $c > 0$, and $0$ otherwise, and where $N$ is the corpus size. The definition above differs slightly from van Heuven’s formula in that we do not use “add 1” smoothing, which causes issues with fractional frequencies, and give a function which is defined at 0; the Zipf scale of a zero-frequency word, naturally, is zero. Here is a tiny Python module for computing it, and here is the associated unit test. 

We use this definition in the new webapp-based CityLex, out now, as just one of several ways to express frequency norms.

References

Van Heuven, Walter J.B., Mandera, P., Keuleers, E. and Brysbaert, M. 2014. SUBTLEX-UK: a new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology 67: 1176-1190.

ACL meetings

While I continue to work in computational linguistics, broadly construed, I am feeling less and less motivated to actually attend the core ACL events I’ll denote as *CL (the international ACL and EMNLP meetings, and the regional meetings: EACL in Europe, NAACL in North America, IJCNLP in East Asia, and so on).

This is not a “why I left…” post, nor do I have much constructive criticism, but it is helpful to contrast with the kinds of conferences I attend when wearing my formal linguist hat. As a formal linguist, I overwhelmingly attend conferences in the “ACELA corridor” portion of the Mid-Atlantic and New England well-served by trains, and pay registration fees of $100 or even less. In contrast, I cannot even remember the last time any *CL meeting was in this (rich, populous, and well-educated) region of the country, and I expect to spend my entire travel budget for the year on attending even a domestic *CL thanks to skyrocketing registration fees and hotel prices. (It doesn’t help that these conferences tend to go on for a long time so you need a lot of nights in the hotel.) I don’t know how much the ACL can do about costs, but the *CL conference locations tend towards the random or exotic rather than dense areas with a lot of research output.

There are two other big issues with *CL that make me less likely to attend them. First, other than the handful of senior faculty who are in ACL leadership, and the same invited talks (it’s the same few people over and over…) there are hardly any faculty at *CL conferences anymore. I don’t see many of my mid-career peers, and I don’t see senior people either. Something needs to be done to encourage these people to attend. Secondly, the program of *CL conferences, even in the main sessions, are overwhelmingly inclined towards what are essentially system demonstrations using known technologies, rather than new ideas, critiques, unsolved problems, or system comparisons. To put it another way, I guess I’m glad BERT or GPT-4o worked for your problem, but that kind of talk or poster doesn’t exactly make for scintillating scientific debate. 

I continue to work in these areas but I suspect I am going to opt in for online presentation (or just go straight to the journals) more in the future; and perhaps send students in my stead.