How not to acquire phonological exchanges

I recently gave a talk at the Canadian Linguistic Association (sort of like the LSA, but Canadian and frankly a lot better because it doesn’t have nearly as many prominent crackpots and cranks) with Charles Reiss on the notion of “exchange rules” as they’re understood in Logical Phonology (LP). Whereas SPE-era theories can use alpha-notation to generate exchange rules, LP can model exchange processes via a series of seemingly complex rules, either via a Duke of York gambit or opportunistic abuse of underspecification. Since it seems quite likely that purely phonological exchanges don’t in fact exist, we suggest that the language acquisition device is constrained so as to not profer analyses of those types, though we also consider that this may be an accident of the “diachronic  filter” of the sort developed by Ju. Blevins, M. Hale, and J. Ohala. The handout is here for those interested, and like other things we’ve been writing about LP, will probably be included in a forthcoming book. One interesting question that we raise, but don’t answer, is what one ought to do about exchanges in Optimality Theory. Alderete (2001), for example, proposes to model them with a family of “anti-faithfulness constraints”. While one could predict the absence of exchanges by eliminating this constraint family, Alderete also uses anti-faithfulness for phenomena other than exchanges, and some of these may be more robustly attested; it’s not clear what ought to be done thus.

References

Alderete, J. 2001. Dominance effects as transderivational anti-faithfulness. Phonology 18: 201-253.

Guest & Martin on neural networks as cognitive models

In our “big questions” class, we read a few papers about whether large artificial neural network language models are good (or even candidate) cognitive models. As part of my background reading I also reviewed this recent paper by Guest & Martin (2023). The crux of the paper is an argument based on simple propositional logic, and because it was hard for me to follow, I thought I’d try to review it here.

G&M first identify a commonly, mostly implicit argument for studying artificial neural networks as cognitive models which takes the form of modus ponens. I will take the liberty of generalizing it considerably here.

  • $P \rightarrow Q$: if neural networks (i.e., their outputs) are correlated with behavioral or neuroimaging data, they are plausible cognitive models (“do what people do”).
  • $P$: neural networks are correlated with such data.
  • $\vdash Q$: therefore they are plausible cognitive models.

G&M give several examples where this argument has been applied and this is the exact motivation that linguists engaged in “LLMology” tend to give during the question period. The problem, as G&M note, is that the correctness of this inference depends crucially on whether $P \rightarrow Q$, and there are no shortage of arguments against that proposition. The most obvious one, of course, is the possibility of multiple realizability. They use the example of two clocks are behaviorally quite similar, but one is actually based on springs and cogs whereas the other has a quartz motion powered by a battery. Clearly, neural networks and human brains could both realize the same sorts of behaviors/mappings without being internally the same.

G&M continue that if the above inference is valid, it should be possible to apply modus tollens to it as well. This has the following general form.

  • $P \rightarrow Q$: (as above).
  • $\neg Q$: neural networks are not plausible cognitive models.
  • $\vdash \neg P$: therefore neural networks (i.e., their outputs) are not correlated with behavioral or neuroimaging data.

G&M give several examples where such an argument could easily be applied: so-called hallucinations, cases where neural networks continue to underperform humans when provided with reasonable amounts of data, as well as cases where neural networks can be shown to exhibit superhuman performance! As they conclude: “Even though $Q$ can, and often does, fail to be true, we, as a field, do not formulate its relationship to $P$ in terms of MT [modus tollens]. (G&M: 217). Rather, they argue, what people actually do is show that:

  • $Q \rightarrow P$: if the neural networks are plausible cognitive models (“do what people do”), then neural networks are correlated with behavioral or neuroimaging data

Using this to assert $Q$, of course, is the fallacy of affirming the consequent, and is clearly invalid. What G&M ultimately seem to conclude is that little can be logically concluded from cognitive modeling with artificial neural networks, even if these models remain “useful” in many domains.

References

Guest, O., and Martin, A. E. 2023. On logical inference over brains, behaviour, and artificial neural networks. Computational Brain & Behavior 3:213-227.

Cajal on “diseases of the will”

Charles Reiss (h/t) recently recommended me a short book by Santiago Ramón y Cajal (1852-1934), an important Spanish neuroscientist and physician. Cajal first published the evocatively titled  Reglas y Consejos sobre Investigación Cientifica: Los tónicos de la voluntad in 1897 and it was subsequently revised and translated various times since then. By far the most entertaining portion for me is chapter 5, entitled “Diseases of the Will”. Despite the name, what Cajal actually presents is a taxonomy of scientists who do contribute little to scientific inquiry:  “contemplators”, “bibliophiles and polyglots”, “megalomaniacs”, “instrument addicts”, “misfits”, and “theorists”. I include a PDF of this brief chapter, translated into English, here for interested readers, under my belief it is in the public domain.