Academic reviewing in NLP

It is obvious to me that NLP researchers are, on average, submitting manuscripts far earlier and more often than they ought to. The average manuscript I review is typo-laden, full of figures and tables far too small to actually read or intruding on the margins, with an unusable bibliography that the authors have clearly never inspected. Sometimes I receive manuscripts whose actual titles are transparently ungrammatical.

There are several reasons this is bad, but most of all it is a waste of reviewer time, since the reviewers have to point out (in triplicate or worse) minor issues that would have been flagged by proof-readers, advisors, or colleagues, were they involved before submission. Then, once these issues are corrected, the reviewers are again asked to read the paper and confirm they have been addressed. This is work the authors could have done, but which instead is pushed onto committees of unpaid volunteers.

The second issue is that the reviewer pool lacks relevant experience. I am regularly tasked with “meta-reviewing”, or critically summarizing the reviews. This is necessary in part because many, perhaps a majority, of the reviewers simply do not know how to review an academic paper, having not received instruction on this topic from their advisors or mentors, and their comments need to be recast in language that can be quickly understood by conference program committees.

[Moving from general to specific.]

I have recently been asked to review an uncommonly large collection of papers on the topic of prompt engineering. Several years ago, it became apparent that neural network language models, trained on enormous amounts of text data, could often provide locally coherent (though rarely globally coherent) responses to prompts or queries. The parade example of this type of model is GPT-2. For instance, if the prompt was:

Malfoy hadn’t noticed anything.

the model might continue:

“In that case,” said Harry, after thinking it over, “I suggest you return to the library.”

I assume this is because there’s fan fiction in the corpus, but I don’t really know. Now it goes without saying that at no point will, Facebook, say, launch a product in which a gigantic neural network is allowed to regurgitate Harry Potter fan fiction (!) at their users. However, researchers persist for some reason (perhaps novelty) to try to “engineer” clever prompts that produce subjectively “good” responses, rather than attempting to understand how any of this works. (It is not an overstatement to say that we have little idea why neural networks, and the methods we use to train them in particular, work at all.) What am I to do when asked to meta-review papers like this? I try to remain collegial, but I’m not sure this kind of work ought to exist at all. I consider GPT-2 a billionaire plaything, a rather wasteful one at that, and it is hard for me to see how this line of work might make the world a better place.

Leave a Reply Cancel reply