Action, not ritual

It is achingly apparent that an overwhelming amount of research in speech and language technologies considers exactly one human language: English. This is done so unthinkingly that some researchers seem to see the use of English data (and only English) as obvious, so obvious as to require no comment. This is unfortunate in part because English is, typologically speaking, a bit of an outlier. For instance, it has uncommonly impoverished inflectional morphology, a particularly rigid word order, and rather large vowel inventory. It is not hard to imagine how lessons learned designing for—or evaluating on—English data might not generalize to the rest of the world’s languages. In an influential paper, Bender (2009) encourages researchers to be more explicit about the languages studied, and this, framed as an imperative, is has come to be called the Bender Rule.

This “rule”, and the aforementioned observations underlying it, have taken on an almost mythical interpretation. They can easily be seen as a ritual granting the authors a dispensation to continue their monolingual English research. But this is a mistake. English hegemony is not merely bad science, nor is it a mere scientific inconvenience—a threat to validity.

It is no accident of history that the scientific world is in some sense an English colony. Perhaps you live in a country that owes an enormous debt to a foreign bank, and the bankers are demanding cuts to social services or reduction of tariffs: then there’s an excellent chance the bankers’ first language is English and that your first language is something else. Or maybe, fleeing the chaos of austerity and intervention, you find yourself and your children in cages in a foreign land: chances are you in Yankee hands. And, it is no accident that the first large-scale treebank is a corpus of English rather than of Delaware or Nahuatl or Powhatan or even Spanish, nor that the entire boondoggle was paid for by the largest military apparatus the world has ever known.

Such material facts respond to just one thing: concrete actions. Rituals, indulgences, or dispensations will not do. We must not confuse the act of perceiving and naming the hegemon with the far more challenging act of actually combating it. It is tempting to see the material conditions dualistically, as a sin we can never fully cleanse ourselves of. But they are the past and a more equitable world is only to be found in the future, a future of our own creation. It is imperative that we—as a community of scientists—take  steps to build the future we want.

References

Bender, Emily M. 2009. Linguistically naïve != language independent: why NLP needs linguistic typology. In EACL Workshop on the Interaction Between Linguistics and Computational Linguistics, pages 26-32.

Leave a Reply

Your email address will not be published. Required fields are marked *