Moneyball Linguistics

[This is just a fun thought experiment. Please don’t get mad.]

The other day I had an intrusive thought: the phrase moneyball linguistics. Of course, as soon as I had a moment to myself, I had to sit down and think what this might denote. At first I imagined building out a linguistics program on a small budget like Billy Beane and the Oakland A’s. But it seems to me that linguistics departments aren’t really much like baseball teams—they’re only vaguely competitive (occasionally for graduate students or junior faculty), there’s no imperative to balance the roster, there’s no DL list (or is just sabbatical?), and so on—and the metaphor sort of breaks down. But the ideas of Beane and co. do seem to have some relevance to talking about individual linguists and labs. I don’t have OBP or slugging percentage for linguists, and I wouldn’t dare to propose anything so crude, but I think we can talk about linguists and their research as a sort of “cost center” and identify two major types of “costs” for the working linguist:

  1. cash (money, dough, moolah, chedda, cheese, skrilla, C.R.E.A.M., green), and
  2. carbon (…dioxide emissions).

I think it is a perfectly fine scientific approximation (not unlike competence vs. performance) to treat the linguistic universe as having a fixed amount of cash and carbon, so that we could use this thinking to build out a roster-department and come in just under the pay cap. While state research budgets do fluctuate—and while our imaginings of a better world should also include more science funding—it is hard to imagine near-term political change in the West would substantially increase it. And similarly, while there is roughly 1012 kg of carbon in the earth’s crust, climate scientists agree that the vast majority of it really ought to stay there. Finally, I should note that maybe we shouldn’t treat these as independent factors, given that there is a non-trivial amount of linguistics funding via petrodollars. But anyways, without further ado, let’s talk about some types of researchers and how they score on the cash-and-carbon rubric.

  • Armchair research: The armchairist is clearly both low-cash (if you don’t count the sports coats) and low-carbon (if you don’t count the pipe smoke).
  • Field work: “The field” could be anywhere, even the reasonably affordable, accessible, and often charming Queens, the archetypical fieldworker is flying in, first on a jet and then maybe reaches their destination via helicopter or seaplane. Once you’re there though, life in the field is often reasonably affordable, so this scores as low-cash, high-carbon.
  • Experimental psycholinguistics: Experimental psycholinguists have reasonably high capital/startup costs (in the form of eyetracking devices, for instance) and steady marginal costs for running subjects: the subjects themselves may come from the Psych 101 pool but somebody’s gotta be paid to consent them and run them through the task. We’ll call this medium-cash, low-carbon.
  • Neurolinguistics: The neurolinguistic imaging technique du jour, magnetoencephalography (or MEG), requires superconducting coils cooled to a chilly 4.2 K (roughly −452 °F); this in turn is accomplished with liquid helium. Not only is the cooling system expensive and power-hungry, the helium is mostly wasted (i.e., vented to the atmosphere). Helium is itself the second-most common element out there, but we are quite literally running out of the stuff here on Earth. So, MEG, at least, is high-cash, high-carbon.
  • Computational linguistics: there was a time not so long ago when I would said that computational linguists were a bunch of hacky-sackers filling up legal pads with Greek letters (the weirder the better) and typing some kind of line noise they call “Haskell” into ten-year-old Thinkpads. But nowadays, deep learning is the order of the day, and the substantial carbon impact from these methods are well-documented, or at least well-estimated (e.g., Strubell et al. 2019). Now, it probably should be noted that a lot of the worst offenders (BigCos and the Quebecois) locate their data centers near sources of plentiful hydroelectric power, but not all of us live within the efficient transmission zones for hydropower. And of course, graphics processing units are expensive too. So most computational linguistics is, increasingly, high-cash, high-carbon.

On a more serious note, just so you know, unless you run an MEG lab or are working on something called “GPT-G6”, chances are your biggest carbon contributions are the meat you eat, the cars you drive, and the short-haul jet flights you take, not other externalities of your research.

References

Strubell, M., Ganesh, A. and McCallum, A. 2019. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645-3650.

Leave a Reply

Your email address will not be published. Required fields are marked *