Aside – Page 2 – Wellformedness

“Indic” considered harmful

Indic is an adjective referring to the Indo-Aryan languages such as Hindi-Urdu or Bengali. These languages are spoken mostly in the northern parts of India, as well as in Bangladesh, Pakistan, Sri Lanka, Nepal, and the Maldives. This term can be confusing, because hundreds of millions of people in the Indian subcontinent (and nearby island nations) speak non-Indic first languages: over 250 million people, particularly in the south of India and the north of Sri Lanka, speak Dravidian languages, which include Malayalam, Tamil, and Telugu. Austronesian, Tibeto-Burman, and Tai-Kadai languages, and many language isolates, are also spoken in the India and the other nations of subcontinent, as is English (and French, and Portuguese). Unfortunately, there is now a trend to use Indic to mean ‘languages of the subcontinent’. See here for a prominent example. This is a new sense for Indic, and while there is probably a need for such a lexeme to express the notion (language of India or subcontinental language would work), reusing Indic, which already has a distinct and well-established sense, just adds unnecessary confusion.

Growing consensus

Any time I read a paper that begins, roughly, “there is a growing consensus that P“, there is not in fact, as far as I can tell, a growing consensus in support of P.

It’s “Penn”

This is probably a losing battle at this point, but the University of Pennsylvania’s short name is and has always been Penn and UPenn is something of a shibboleth (probably derived from the the URL upenn.edu).

Citation practices

In a previous post I talked about an exception to the general rule that you should expand acronyms: sometimes what the acronym expands to is a clear joke made up after the fact. This is an instance of a more general principle: you should provide, via citations, information the reader needs to know or stands to benefit from. To that point, nobody has ever really cared about the mere fact that you “used R (R Core Team 2021)”. It’s usually not relevant. R is one of hundreds of Turing-complete programming environments, and most of the things it can do can be done in any other language. Your work almost surely can be replicated in other environments. It might be interesting to mention this if a major point of your paper is that wrote, say, a new open-source software package for R: there the reader needs to know what platform this library targets. But otherwise it’s just cruft.

Isaacson and Lewis

It’s amusing to me that Walter Isaacson and Michael Lewis—who happened to go to the same elite private high school in New Orleans, just a few years apart—are finally having their oeuvres as favorable stenographers for the rich and powerful reassessed more or less simultaneously. Isaacson clearly met his match with Elon Musk, a deeply incurious abuser who gave Isaacson quite minimal access; Lewis does seem to be one of a handful of people who actually believed in that ethical altruism nonsense Sam Bankman-Fried was cooking up. Good riddance, I say.

Myths about writing systems

In collaboration with Richard Sproat, I just published a short position paper on “myths about writing systems” in NLP to appear in the proceedings for CAWL, the ACL Workshop on Computation and Writing Systems. I think it will be most of all useful to reviewers and editors who need a resource to combat nonsense like Persian is a right-to-left language and want to suggest a correction. Take a look here.

Being mad online

[A spiritual sucessor to this post…]

I recently realized that a lot of young linguists think the thought leaders in the field are the same people who are the most Mad Online. I can assure you that is not so true.

“Hi both”

I recently noticed that I have been receiving emails, addressed to multiple recipients, with the salutation

Hi both,

In fact I appear to have 34 of them in one of my many inboxes, including two received from different authors today. For me, this is sharply ungrammatical, though I am not sure why. Both is unobjectionable in subject or object position (e.g., Both liked limoncello, She fancies both, etc.) so I am not sure why it is bad in a salutation. A very informal survey of the people who have sent it to me shows two speakers for whom English is their second language (though both have extremely high proficiency) but also several native speakers too. Any ideas?

Do surface representations contain phonemes?

An interesting philosophical question: if a phoneme present in underlying representation surfaces faithfully, does the surface representation “contain” that phoneme, or is it better to say faithful phonemes have been vacuously transduced to an (allo)phone with the same specification?

Noam and Bill are friends

One of the more confusing slanders against generativism is the belief that it has all somehow been undone by William Labov and the tradition of variationist sociolinguistics. I have bad news: Noam and Bill are friends. I saw them chopping it up once, in Philadelphia, and I have to assume they were making fun of functionalists. Bill has nice things to say about the generativist program in his classic paper on negative concord; Noam has some interesting comments about how the acquirenda probably involve multiple competing grammars in that Piaget lecture book. They both think functionalism is wildly overrated. And of course, the i-language perspective that Noam brings is an absolute essential to dialogues about language ideologies, language change, stigma and stratification, and so forth that we associate with Bill.