Stop capitalizing so much

One of the absolute scourges of student writing is the tendency to capitalize just about every multi-word noun phrase. The rule in English is pretty simple: you only capitalize proper names, and these are, roughly, the names of people, locations, or organizations. Technical concepts do not qualify. It doesn’t matter if it’s part of an acronym: we capitalize the acronym but not necessarily the full phrase. Natural language processing is not a proper name; cognitive science isn’t either; logistic regression certainly is not a proper name nor is conditional random fields or hidden Markov model or support vector machine or…

Rich people shouldn’t drive

I don’t understand why the filthy rich ever drive. Sure, I get why Ferdinand Habsburg gets into the Eva cockpit: an F1 race is the modern-day tournament. But driving is a dangerous, high-liability, cognitively taxing activity and it’s easy for the rich to offload those hazards to a specialist. I don’t understand why, for example:

In the unlikely event that I hit centimillion status, the first thing I’m doing is buying a black, under-the-radar towncar and hiring a chaffeur with good personal recommendations. And before that, when I enter decamillion territory, I’m just calling UberXen. No alternate-side parking, no DUIs for me. I don’t know about Justin, but surely Warren and Sam have something better to do than be behind the wheel. They could be power napping, meditating, watching the market, or catching up on X (“the everything app”) the back of their car instead.

The presupposition of “recognize”

There’s an interesting pragmatics thing going on in the official statement ex-first lady Melania Trump put out after her husband was grazed by a sniper’s bullet. (The full statement is here if you care; it’s not very interesting overall.) However I was drawn to an interesting violation of presupposition in the document:

A monster who recognized my husband as an inhuman political machine attempted to ring out Donald’s passion – his laughter, ingenuity, love of music, and inspiration.

A few things are going on here; let me put aside the awkward non-parallelism of laughter vs. love of music vs. ingenuity and inspiration and note that the verb she wants in the embedded clause is wring out (figuratively, to extract by means of forceful action) not ring out. But the more interesting one is the use of recognized. To say that the shooter recognized Donald Trump as an inhuman machine presupposes that the speaker agrees with this assessment; or perhaps more generally that it is in the common ground that Donald Trump is an inhuman machine, at least in my idiolect. There is nothing in the text or subtext of the statement suggesting she views her husband as a monster, despite the long and tedious tradition of trying to “read resistance” into the wives of right-wing American politicians. For me verbs like misconstrued or mistook presupposes the opposite, that the speaker and/or common ground disagrees with this assessment, and that’s what I suppose Mrs. Trump meant to say here. I don’t blame Mrs. Trump for this; English is not her first language, though she speaks it quite well. But she’s famous and rich enough that she ought to employ a PR professional or lawyer to proof-read public statements like I’m sure Mrs. Obama or Mrs. Bush do.

Medical bills

Starting about two years ago, I got an unexpected medical bill in the mail. The amount wasn’t very high, but I was quite frustrated and annoyed. First, this was from a local College of Dentistry, where most procedures are free for the insured (and probably not insured too); there was no “explanation of benefits” that explained this was a co-pay, or that my insurance only covered some portion. Secondly, I hadn’t been to the College of Dentistry in quite a while, so I had no idea which of the various procedures this was or even what day I received the billed service. Third, there was no way to get more information: the absolute worst thing about this provider is that the administrative staff are some of the most overloaded and overworked people I have ever seen, and I have witnessed them just let the phone ring because they’re dealing with a huge line of in-person patients (some of whom are bleeding from their mouth). So I didn’t pay it. After a while though, the bills continued and I started to worry. Was I wasting paper for no reason? Would this harm my credit score? So I put about an hour into finding a way to actually get in touch with the billing office: turns out this was a Google Form buried somewhere on a website, and if you fill it out, a someone calls you (in my case, within the hour!), looks up your chart, and can tell you the date of service and why you were billed. Why they didn’t just include this in the bill in the first place? I have to imagine this makes it ever harder for the College to actually collect on these debts.

“Indic” considered harmful

Indic is an adjective referring to the Indo-Aryan languages such as Hindi-Urdu or Bengali. These languages are spoken mostly in the northern parts of India, as well as in Bangladesh, Pakistan, Sri Lanka, Nepal, and the Maldives. This term can be confusing, because hundreds of millions of people in the Indian subcontinent (and nearby island nations) speak non-Indic first languages: over 250 million people, particularly in the south of India and the north of Sri Lanka, speak Dravidian languages, which include Malayalam, Tamil, and Telugu. Austronesian, Tibeto-Burman, and Tai-Kadai languages, and many language isolates, are also spoken in the India and the other nations of subcontinent, as is English (and French, and Portuguese). Unfortunately, there is now a trend to use Indic to mean ‘languages of the subcontinent’. See here for a prominent example. This is a new sense for Indic, and while there is probably a need for such a lexeme to express the notion (language of India or subcontinental language would work), reusing Indic, which already has a distinct and well-established sense, just adds unnecessary confusion.

Citation practices

In a previous post I talked about an exception to the general rule that you should expand acronyms: sometimes what the acronym expands to is a clear joke made up after the fact. This is an instance of a more general principle: you should provide, via citations, information the reader needs to know or stands to benefit from. To that point, nobody has ever really cared about the mere fact that you “used R (R Core Team 2021)”. It’s usually not relevant. R is one of hundreds of Turing-complete programming environments, and most of the things it can do can be done in any other language. Your work almost surely can be replicated in other environments. It might be interesting to mention this if a major point of your paper is that wrote, say, a new open-source software package for R: there the reader needs to know what platform this library targets. But otherwise it’s just cruft.

Isaacson and Lewis

It’s amusing to me that Walter Isaacson and Michael Lewis—who happened to go to the same elite private high school in New Orleans, just a few years apart—are finally having their oeuvres as favorable stenographers for the rich and powerful reassessed more or less simultaneously. Isaacson clearly met his match with Elon Musk, a deeply incurious abuser who gave Isaacson quite minimal access; Lewis does seem to be one of a handful of people who actually believed in that ethical altruism nonsense Sam Bankman-Fried was  cooking up. Good riddance, I say.

Myths about writing systems

In collaboration with Richard Sproat, I just published a short position paper on “myths about writing systems” in NLP to appear in the proceedings for CAWL, the ACL Workshop on Computation and Writing Systems. I think it will be most of all useful to reviewers and editors who need a resource to combat nonsense like Persian is a right-to-left language and want to suggest a correction. Take a look here.