“Indo-European” is not a meaningful typological descriptor

A trope I see in a lot of student writing (and computational linguistics writing at all levels) is a critique of prior work as being only on “Indo-European languages”, and sometimes a promise that current or future work will target “non-Indo-European languages”.

To me, this is drivel. The Indo-European language family is quite diverse; i.e., for the vast majority of things I’m interested in, either, say, Italian or Russian is sufficiently different from English to make a relevant comparison. And there are a huge number of “non-Indo-European” languages that are typologically similar to at least some Indo-European languages on at least some dimensions; i.e., Finno-Ugric, “Aquitanian” (i.e., Basque) and the (narrowly defined) “Altaic” families (Mongolic, Tungusic, and Turkic) have quite a bit in common typologically with IE, as do, say, Japanese and Korean. Genetic relatedness just isn’t that typologically informative in very dense, very “old” families like IE.

If you want to talk typology, you should focus on typological aspects actually relevant to your study rather than genetic relatedness. If you’re studying phonology, the presence of vowel harmony in the family may be relevant (but note that Estonian, despite being Finnic, does not have productive harmony); if you’re interested in morphology, then notions like agglutination may be relevant (though not necessarily). Gross word order descriptors (like “VSO”) are likely to be relevant for syntax, and so on.

In some cases, one of the relevant typological aspects is not language typology, but rather writing systems typology. There, genetic relatedness isn’t very informative either, because the vast majority of writing systems used today (and virtually all of them outside East Asia) are ultimately descended from Egyptian hieroglyphs. And we shouldn’t confuse writing system and language.

Linguists ought to know better.

How many Optimality Theory grammars are there?

How many Optimality Theory (OT) grammars are there? I will assume first off that Con, the constraint set, is fixed and finite. I make this assumption because it is one assumed by nearly all work on the mathematical properties of OT. Given the many unsolved problems in OT learning, joint learning of constraints and rankings seems to add unnecessary additional complexity.1 So we suppose that there are a finite set of constraints, and let $n$ be the cardinality of that set. If we put aside for now the contents of the lexicon (i.e., the set of URs), one can ask how many possible constraint rankings there are as a function of $n$.

Prince & Smolensky (1993), in their “founding document”, seem to argue that constraint rankings are total. Consider the following from a footnote:

With a grammar defined as a total ranking of the constraint set, the underlying hypothesis is that there is some total ranking which works; there could be (and typically will be) several, because a total ranking will often impose noncrucial domination relations (noncrucial in the sense that either order will work). It is entirely conceivable that the grammar should recognize nonranking of pairs of constraints, but this opens up the possibility of crucial nonranking (neither can dominate the other; both rankings are allowed), for which we have not yet found evidence. Given present understanding, we accept the hypothesis that there is a total order of domination on the constraint set; that is, that all nonrankings are noncrucial.

So it seems like they treat strict ranking as a working hypothesis. This is also reflected in the name of their method factorial typology, because there are $n!$ strict rankings of $n$ constraints. Roughly, they propose that one generate all possible rankings of a series of possible constraints, and compare their extension to typological information.2 In practice, as they acknowledge, OT grammars—by which I mean theories of i-languages presented in the OT literature—contain many cases where two constraints $c, d$ need not be ranked with respect to each other. For example, the first chapter of Kager’s (1999) widely used OT textbook includes a “factorial typology” (p. 36, his ex. 53) in which some constraints are not strictly ranked, and this is followed by a number of tableaux in which he uses a dashed vertical line to indicate non-ranking.3

Prince & Smolensky also acknowledge the existence of cases where there is no evidence with which to rank $c$ versus $d$. The situation is important because any learning algorithm which was trying to enforce a strict ranking of constraints would have to resort to a coin flip or some other unprivileged mechanism to finalize the ranking. Such algorithms are in principle possible to imagine, but neither the final step nor the postulate are motivated in the first place.

Prince & Smolensky finally admit the possibility of what they call crucial non-ranking, cases where leaving $c$ and $d$ mutually unranked has the right extension, but $c lt d$ and $d lt c$ both have the wrong extension.4 Such would constitute the strongest evidence for viewing OT grammars as weakly ranked, particularly if such grammars are linguistically interesting.

I will adopt the hypothesis of weak ranking; it seems unavoidable if only as a matter of acquisition. If one does, the set of possible rankings is actually much larger than $n!$. For example, for the empty set and singleton sets, there is but one weak ranking. For sets of size 2, there are 3: ${a lt b; b lt a; a, b}$. And, at the risk of being pedantic, for sets of size 3 there are 13:

  1. $a \lt b \lt c$
  2. $a \lt b, c$
  3. $a \lt c \lt b$
  4. $a, b\lt c$
  5. $a, b, c$
  6. $a, c \lt b$
  7. $b \lt a \lt c$
  8. $b \lt a, c$
  9. $b \lt c \lt a$
  10. $b, c \lt a$
  11. $c \lt a \lt b$
  12. $c \lt a, b$
  13. $c \lt b \lt a$

Already one can see that this is growing faster than $n!$ since $3! = 6$.

It is not all that hard to figure out what the cardinality function is here. Indeed, I think Charles Yang (h/t) showed this to me many years ago, and though I lost my notes and had to re-derive it, I’m reasonably sure he came up with is the same solution that I do here. This sequence is known as A000670 and has the formula

$$a(n) = \sum_{k = 0}^n k! S(n, k)$$

where $S$ is the Stirling number of the second kind. Expanding that further we obtain:

$$a(n) = \sum_{k = 0}^n k! \sum_{i = 0}^k \frac{(-1)^{k – i}i^n}{(k – i)!i!} .$$

This is a lot of grammars by any account. With just 10 constraints, we are already in the billions, and with 20 it’s between 2 and 3 sextillion. Yet this doesn’t look all that dramatically different than strict ranking when plot in log scale.

Clearly this is essentially infinite in the terminology of Gallistel and King (2009), and this ability to generate such combinatoric possibilities from a small inventory of primitive objects and operations is something that has long been recognized as a desirable element of cognitive theories. Of course, many of these weak rankings will be extensionally equivalent, and even others, I hypothesize, will probably be extensionally non-equivalent but extensionally equivalent with respect to some lexicon. None of this is unique to OT: it’s just good cognitive science.

Endnotes

  1. Note that the constraint induction of Hayes & Wilson (2008), for example, only considers a finite set thereof and therefore there does exist some finite set for their approach. The same is probably true of approaches with constraint conjunction so long as there are some reasonable bounds.
  2. Exactly how the analyst is supposed to use this comparison is a little unclear to me, but presumably one can eyeball it to determine if the typological fit is satisfactory or not.
  3. As far as I can tell, he never explains this notation.
  4. There is a tradition of using $\ll$ for OT constraint ranking but I’ll put it aside because $\lt$ is perfectly adequate and is the operator used in order theory.

References

Gallistel, C. Randy and King, A. P. 2009. Memory and the Computational Brain: Why Cognitive Science Will Transform Neuroscience. Wiley-Blackwell.
Hayes, B. and Wilson, C. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39: 397-440.
Prince, A., and Smolensky, P. 1993. Optimality Theory: constraint interaction in generative grammar. Rutgers Center for Cognitive Science Technical Report TR-2.

Neural fossils

Neural network cognitive modeling had a brief, precocious golden era between 1986 (the year the Parallel Distributed Processing books came out) and maybe about 1997 (at which point the limitations of those models were widely known…though I’m little fuzzier about when this realization settled in). During that period, I think it’s fair to say, a lot of people got hired into the faculty, in psychology and linguistics in particular, simply because they knew a bit about this exciting new approach. Some of those people went on to do other interesting things once the shine had worn off, but a lot of them didn’t, and some of them are even still around, haunting the halls of R1s. I think something similar will happen to the new crop of LLMologists in the academy: some have the skills to pivot should we reach peak LLM (if we haven’t already), but many don’t.

Email discipline

There is a Discourse on what we might call email discipline. Here are a few related takes.

There are those who simply don’t respond to email at all. These people are demons and you should pay them no mind. 

Relatedly, there are those who “perform” some kind of message about their non-email responding. Maybe they have a long FAQ on their personal website about how exactly they do or do not want to be emailed. I am not sure I actually believe these people get qualitatively more email than I do. Maybe they get twice as much as me, but I don’t think anybody’s reading that FAQ buddy. Be serious.

There are those who believe it is a violation to email people off-hours, or on weekends or holidays, or whatever. I don’t agree: it’s an asynchronous communication mechanism, so that’s sort of the whole point. I can have personal rules about when I read email and these depend in no way on my rules (or lack thereof) about when you send them. Expecting people to know and abide by your Email Reading Rules FAQ is just as silly.

I have an executive function deficit, diagnosed as a child (you know the one), and if you’re lucky, they teach you strategies to cope. I think non-impaired people should just model one of the best: email can’t be allowed to linger. If it’s unimportant, you need to archive it. If it’s important you need to respond to it. You should not have a mass of unopened, unarchived emails at any point in your life. It’s really that easy.

When LLMing goes wrong

[The following is a guest post from Daniel Yakubov.]

You’ve probably noticed that industries have been jumping to adopt some vague notion of “AI” or peacocking about their AI-powered something-or-other. Unsurprisingly, the scrambled nature of this adoption leads to a slew of issues. This post outlines a fact obvious to technical crowds, but not business folks; even though LLMs are a shiny new toy, LLM-centric systems still require careful consideration.

Hallucination is possibly the most common issue in LLM systems. It is the tendency for an LLM to prioritize responding rather than responding accurately, aka. making stuff up. Considering some of the common approaches to fixing this, we can understand what problems these techniques introduce. 

A quick approach that many prompt engineers I know think is the end-all be-all of Generative AI is Chain-of-Thought (CoT; Wei et al 2023). This simple approach just tells the LLM to break down its reasoning “step-by-step” before outputting a response. While a bandage, CoT does not actually inject new knowledge into an LLM, this is where the Retrieval Augmented Generation (RAG) craze began. RAG represents a family of approaches that add relevant context to a prompt via search (Patrick et al. 2020). RAG pipelines come with their own errors that need to be understood,  including noise in the source documents, misconfigurations in the context window of the search encoder, and specificity of the LLM reply (Barnett et al. 2024). Specificity is particularly frustrating. Imagine you ask a chatbot “Where is Paris?” and it replies “According to my research, Paris is on Earth.” At this stage, RAG and CoT combined still cannot deal with complicated user queries accurately (or well, math). To address that, the ReAct agent framework (Yao et al 2023) is commonly used. ReAct, in a nutshell, gives the LLM access to a series of tools and the ability to “requery” itself depending on the answer it gave to the user query. A  central part of ReAct is the LLM being able to choose which tool to use. This is a classification task, and LLMs are observed to suffer from an inherent label bias (Reif and Schwarz, 2024), another issue to control for.

This can go for much longer, but I feel the point should be clear. Hopefully this gives a more academic crowd some insight into when LLMing goes wrong.

References

Barnett, S., Kurniawan, S., Thudumu, S. Brannelly, Z., and Abdelrazek, M. 2024. Seven failure points when engineering a retrieval augmented generation system.
Lewis, P., Perez, E., Pitkus, A., Petroni, F., Karpukhin, V., Goyal, N., …, Kiela, D. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks.
Reif, Y., and Schwartz, R. 2024. Beyond performance: quantifying and mitigating label bias in LLMs.
Wei, J. Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., …, Zhou, D. 2023. Chain-of-thought prompting elicits reasoning in large language models.
Yao, S., Zhao, J., Yu, D., Shafran, I., Narasimhan, K., and Cao, Y. 2023. ReAct: synergizing reasoning and acting in language models.

 

Snacks at talks

The following is how to put out a classy spread for your next talk; ignoring beverages and extras, everything listed should ring up at around $50.

  • The most important snack is cheese. Yes, some people are vegan or lactose-intolerant, but cheese is one of the most universally-beloved snacks world-wide. Most cheeses keep for a while with refrigeration, and some even keep at room temperature. Cheese is, as a dear friend says, one of the few products whose quality scales more or less linearly with its price, and I would recommend at least two mid-grade cheeses. I usually buy one soft one (Camembert, Brie, and Stilton are good choices) and one semi-hard one (Emmental or an aged Cheddar for example). The cheese should be laid out on a cutting board with some kind of metal knife for each. The cheese should not be pre-cut (that’s a little tacky). Cheeses should be paired with a box of Carr’s Water Crackers or similar. Estimated price: $15-20.
  • Fresh finger vegetables are also universally liked. The easiest options are finger carrots and pre-cut celery sticks. If you can find pre-cut multi-color bell peppers or broccoli, those are good options too. You can pair this with some kind of creamy dip (it’s easy to make ranch or onion dip using a pint of sour cream and a dip packet, but you need a spoon or spatula to stir it up) but you certainly don’t have to. Estimated price: $10-20.
  • Fruit is a great option. The simplest thing to do is to just buy berries, but this is not foolproof: blueberries are a little small for eating by hand; raspberries lack structural integrity, and where I live, strawberries are only in season in the mid-summer, and are expensive and low-quality otherwise. In Mid-Atlantic cities, there are often street vendors who sell containers of freshly-cut fruit (this usually includes slices of pineapples and mangos and bananas, and perhaps some berries) and if this is available this is a good idea too. Estimated price: $10-15.

This, plus some water, is basically all you need to put out. Here are some ways to potentially extend it.

  • Chips are a good option. I think ordinary salty potato chips are probably the best choice simply because they’re usually eaten by themselves. In contrast, if you put out tortilla chips, you need to pair them with some kind of salsa or dip, and you need to buy a brand with sufficient “structural integrity” to actually pick up the dip.
  • Nuts are good too, obviously; maybe pick out a medley.
  • Soda water is really popular and cheap. I recommend 12oz cans. It should always be served chilled.
  • A few bottles or cans of beer may go over well. With rare exceptions, should be served chilled.
  • A bottle of wine may be appropriate. Chill it if it’s a varietal that needs to be chilled.

If the talk is before noon, coffee (and possibly hot water and tea bags) is more or less expected. There is something of a taboo in the States of consuming or serving alcohol before 4pm or so, and you may or may not want your event to have a happy hour atmosphere even if it’s in the evening.

And here are a few things I cannot recommend:

  • In my milieu it is uncommon for people to drink actual soda.
  • I wouldn’t recommend cured meats or charcuterie for a talk. The majority of people won’t touch the stuff these days, and it’s pretty expensive.
  • I love hummus, but mass-produced hummus is almost universally terrible. Make it at home (it’s easy if you have a food processor) or forget about it.
  • Store-bought guacamole tastes even worse, and it has a very short shelf life.

Introducing speakers

The following are my (admittedly normative) notes on how to introduce a linguistics speaker.

  • The genre most similar to the introduction of speaker is the congratulatory toast. An introduction should be brief, and the lengthy written introduction should be scorned. The speaker is already making an imposition on the audiences’ time, and for the host to usurp more of this time than necessary is a further imposition on both host and audience.
  • The introduction is not an opportunity for the introducer to demonstrate  erudition, but it can be an opportunity to show wit.
  • The introduction should be in the introducer’s voice. For this reason, a biography paragraph provided by the speaker should not be read as part of the introduction.
  • The introduction should be exemporaneous. The introducer can prepare brief notes, but they should fit on a notecard or their hand, and the notes should never be “read”.
  • Polite humor, brief personal anecdotes (e.g., when the introducer first met the speaker or became aware of their work), and heart-felt superlatives or compliments (one of the nicest introductions I ever received stated that I was “in the business of keeping people honest”) are to be encouraged.
  • The introduction should state the speakers’ current affiliation and title, if any, but need not list their full occupational or educational history unless it is judged relevant.
  • Introducers may feel an urge to read the title of the talk when concluding their introduction, but should resist this urge. There is no real need—the audience already has seen the talk title in the program or other announcements, and they can read the slide—and the speaker normally feels the need to read it out loud regardless.
  • The introduction should conclude with the speaker’s name. In one common style, which I consider elegant, the introducer is careful not to say the speaker’s full name until this conclusion, and uses epithets like “our next speaker” or “our honored guest” earlier in the introduction.

Libfix report: -gler and what we can learn from it

The libfix -gler is an interesting case that seems to illustrate Zwicky’s hypothesis that blends lead to libfixes. Patient zero is clearly Googler, which is corporate’s preferred term for Google employees; it is is widely used in-group as well. This is an ordinary example of the relatively productive -er suffix that creates agent nominals (backbencher, J6er), with a connotation that the agent does something habitually (e.g., pickleballer) or as an occupation (cartographer). That a Googler is someone who works at Google, and not just a habitual user of the search engine, is slightly notable but not shocking.

The next best-established forms look more like blends based on Googler. First and most saliently, there is noogler ‘new Google employee’, where the word-onset has been replaced to create a blend with new. Members of an official affiliational group (/listserv) for Jewish Googlers call themselves Jewglers, with a similar single-phoneme substitution. Since both new and Jew share the /Cuː-/ initial of Google(r)—the only adaptation is changing the place and manner of the initial C—this still looks like blending rather than recuttingXoogler ‘former employee of Google’ is presumably pronounced ex-oogler (I’ve never heard it said out loud myself); the term is commonly used by entrepeneurs in their fund-raising (cf. xoogler.co); this could be a blend or just regarded as a one-off truncation of ex-Googler.

Cats and cows (and ball pythons) are not permitted at Google’s offices, but I have seen mewgler and moogler for cat- and cow fancier Google employees. However, Googlers are permitted to bring their (well-behaved, vaccinated) dogs to work, and employees who do so call themselves dooglers. (Then again, my colleague LeeAnn says the dogs themselves are the dooglers!) My intuition is that this is pronounced [duː.glɚ] and not *[dɒ.glɚ], and thus this is less blend-like than any of the aforementioned examples, because dog and the base Googler have a different vowel (albeit both back vowels). The same is true for Zoogler for Google employees based out of the Zurich office since the first vowel in the US English pronunciation of Zurich is [ʊ], and even we get a seemingly more dissimilar-to-base front-gliding diphthong in gaygler, the term for members of the company-internal LGBT affiliation group (/listserv).

In my analysis, dooglerZoogler and gaygler strongly suggest that we have gone from a blend with Googler as its base to an incipient liberated affix -gler denoting agents associated with Google.

The biggest puzzle about English libfixes, for me, one not answered in any of the prior work, is why the recutting occurs where it does. It is perhaps not surprising that the rather-homophonous -er has not been given yet another sense, but why is it -gler and not -0ogler (which would give us the not-preposterous, but unattested *gayoogler) or -ler (*gayler)? While Zwicky’s cline hypothesis does not answer this, here is one possible way to operational this: recutting is blend reanalysis, with the source morphemes like new and Jew are parsed maximally in noogler and Jewgler, with the remainder giving us the new affix -gler.

I know of many other libfixes consistent with this analysis. For example, from the blend fursona (fur + persona) we have –sona (e.g., catsona, puppysona); from glitterati (glitter + literati) we have -rati (e.g., Twitteratitechnorati), from funtastic (fun + fantastic) we have -tastic (e.g., chavtastic, shagtastic), from telethon (telephone marathon) we have -(a)thon (e.g., saleathon, mathathon)

This is of course not the full story. Most other libfixes in my corpus seem to arise at a prexisting morpheme boundary, with one or the other piece reinterpreted as a productive affix with new lexical semantics based on the full form. Some examples include cran- from cranberry (e.g., crantini), -gate from Watergate (e.g., Troopergate), -mare from nightmare (e.g., editmare), and -berg from iceberg (e.g., fatberg). In many cases, the morpheme boundaries are abstract ones mirroring the segmentation of Latin or Greek  complex word borrowings as in -(i)verse from the Latin-based universe (e.g., Buffyverse) or -(o)nomics from the Greek-based economics (e.g., the brand name Chemonomics). My corpus also includes Franken– from the German compound name Franken-stein (e.g. Frankenfood) and -nik from the Russian derivative s-put-nik (e.g., peacenik). In the above cases, the segmentation is more or less the same as in the donor language.

In other cases, though, the segmentation is different than that in the donor language, as in -ohol(ic) ultimately from Arabic, which retain a little less of the base than one might expect (here al- is the Arabic definite prefix), and -copter from neo-Greek and -nado from Spanish, which both retain a little more. While I don’t think it’s that controversial to posit that cranberry, economics or universe are represented as complex nouns (our understanding of the morphophonology of English seems to depend on this conclusion, and virtually all behavioral research on word processing supports this), it is perhaps not shocking that ordinary English speakers are unfamiliar with the morphology of the etyma of alcoholhelicopter, and tornado in the ultimate donor languages. These more ad-hoc recuttings don’t necessarily line up with syllable boundaries—though hypothetical *-cohol or *-ler would have—so I suspect there is just some inherent stochasticity to how this final type of recutting proceeds.

Tutorial on Substance-Free Logical Phonology

A few days ago, we (myself with the help of graduate student Rim Dabbous and professor Charles Reiss) gave a detailed tutorial on Substance-Free Logical Phonology (LP) at the LSA meeting in Philadelphia. I was pleasantly surprised with how many people showed up (I printed about 30 handouts, and we ran out) and how engaged they were. If the question period is any indication, our colleagues understood the theory quite well. The handout is now on LingBuzz and we have a lot more material coming for you soon. 

I’ll try to summarize how LP fits into theories of exceptionality in a separate post soon. 

Postscript

I also took the liberty of creating a simple archive for our Logical Phonology papers, the Logical Phonology Archive (LOA for short, pronounced [lwa] like French loi ‘law’ or Kreyòl loa ‘vodou spirit’). Technically, this site is sort of interesting: other than some static text, the table containing papers is an HTML iframe, dynamically generated from a Google Sheet using a template populated by server-side Javascript using the Google Apps Scripts platform; hat tip to my colleague Rivka Levitan for making me aware of this very handy possibility.