Aside – Page 3 – Wellformedness

Being mad online

[A spiritual sucessor to this post…]

I recently realized that a lot of young linguists think the thought leaders in the field are the same people who are the most Mad Online. I can assure you that is not so true.

“Hi both”

I recently noticed that I have been receiving emails, addressed to multiple recipients, with the salutation

Hi both,

In fact I appear to have 34 of them in one of my many inboxes, including two received from different authors today. For me, this is sharply ungrammatical, though I am not sure why. Both is unobjectionable in subject or object position (e.g., Both liked limoncello, She fancies both, etc.) so I am not sure why it is bad in a salutation. A very informal survey of the people who have sent it to me shows two speakers for whom English is their second language (though both have extremely high proficiency) but also several native speakers too. Any ideas?

Do surface representations contain phonemes?

An interesting philosophical question: if a phoneme present in underlying representation surfaces faithfully, does the surface representation “contain” that phoneme, or is it better to say faithful phonemes have been vacuously transduced to an (allo)phone with the same specification?

Noam and Bill are friends

One of the more confusing slanders against generativism is the belief that it has all somehow been undone by William Labov and the tradition of variationist sociolinguistics. I have bad news: Noam and Bill are friends. I saw them chopping it up once, in Philadelphia, and I have to assume they were making fun of functionalists. Bill has nice things to say about the generativist program in his classic paper on negative concord; Noam has some interesting comments about how the acquirenda probably involve multiple competing grammars in that Piaget lecture book. They both think functionalism is wildly overrated. And of course, the i-language perspective that Noam brings is an absolute essential to dialogues about language ideologies, language change, stigma and stratification, and so forth that we associate with Bill.

Industry postdocs

I find the very idea of industry postdocs funny (funny-sad, though). Sure, it makes sense for the academy, with all of its scarcities, to make use of precarious, casualized post-graduate labor, but to extend this to the tech sector is vaguely monstrous. It’s extra funny (but funny-sad too) when you hear of a senior professor doing an industry postdoc at a company with a name like baz.ly during their sabbatical.

Journal websites

It is now 2023, and virtually every journal I review for has a broken website, which further penalizes me for volunteer work I ought to be paid for. This is really unacceptable. Maybe some of the big publishers can take a tiny bite out of their massive revenues (Springer Nature apparently pulled down 1.72b USD in revenue in 2021) and invest it into actually testing their the CRUD apps.

Large LMs and disinformation

I have never understood the idea that large LMs are uniquely positioned to enable the propagation of disinformation. Let us stipulate, for sake of argument, that large LMs can generate high-quality disinformation and that its artificial quality (i.e., not generated by human writers) cannot be reliably detected either by human readers nor by computational means. At the same time, I know of no reason to suppose that large LMs can generate better (less detectable, more plausible) disinformation than can human writers. Then, it is hard to see what advantage there is to using large LMs for disinformation generation beyond a possible economic benefit realized by firing PR writers and replacing them with “prompt engineers”. Ignoring the dubious economics—copywriters are cheap, engineers are expensive—there is a presupposition that disinformation needs to scale, i.e., be generated in bulk, but I see no reason to suppose this either. Disinformation, it seems to me, comes to us either in the form of “big lies” from sources deemed reputable by journalists and lay audiences (think WMDs), or increasingly, from the crowds (think Qanon).

e- and i-France

It will probably not surprise the reader to see me claim that France and French are both sociopolitical abstractions. France is, like all states, an abstraction, and it is hard to point to physical manifestations of France the state. But we understand that states are a bundle of related institutions with (mostly) shared goals. These institutions give rise to our impression of the Fifth Republic, though at other times in history conflict between these institutions gave rise to revolution. But currently the defining institutions share a sufficient alignment that we can usefully talk as if they are one. This is not so different from the i-language perspective on languages. Each individual “French” speaker has a grammar projected by their brain, and these are (generally speaking) sufficiently similar that we can maintain the fiction that they are the same. The only difference I see is that linguists can give a rather explicit account of any given instance of i-French whereas it’s difficult to describe political institutions in similarly detailed terms (though this may just reflect my own ignorance about modern political science). In some sense, this explicitness at the i-language level makes e-French seem even more artificial than e-France.

1-on-1 Zoom

If you’re just doing a “meeting” with one other person located in the same country, I don’t see the point of using Zoom. Ordinary phone lines are more reliable and have more familiar acoustic qualities (this is why VoIP sounds worse: unless you’re quite young, you’re probably far more familiar with the 8kHz sampling rate and whatever compression curve the phone system uses). Just call people on the phone!

Foundation models

It is widely admitted that the use of language in terms like formal language and language model tend to mislead neophytes, since they suggest the common-sense notion (roughly, e-language) rather than the narrow technical sense referring to a set of strings. Scholars at Stanford have been trying to push foundation model as an alternative to what were previously called large language models. But I don’t really like the implication—which I take to be quite salient—that such models ought to serve as the foundation for NLP, AI, whatever. I use large language models in my research, but not that often, and I actually don’t think they have to be part of every practitioner’s toolkit. I can’t help thinking that Stanford is trying to “make fetch happen”.