Pynini 2020: State of the Sandwich

I have been meaning to describe some of the work I have been doing on Pynini, our weighted finite-state grammar development platform. For one, while I have been the primary contributor through the history of the project (Richard Sproat wrote the excellent path iteration library), we are now also getting many contributions from Lawrence Wolf-Sonkin (rewrite of the symbol table wrapper, type hints) and lots of usability and bug reports from the Google linguists.

We are currently on Pynini release 2.1.1. Here are some new features/improvements from the last few releases:

  • 2.0.9: Adds an efficient multi-argument union.
  • 2.0.9: Pynini (and the rest of OpenGrm) are available on Conda via Conda-Forge. This means that for most users, there is no longer any need to compile Pynini by hand; instead Pynini is compiled (for a variety of platforms) in the cloud, using a continuous integration framework.
  • 2.1.0: Rewrites the string compiler so that symbol tables are no longer attached to compiled FSTs, eliminating the need for expensive symbol table merging and relabeling options.
  • 2.1.0: Rewrites the FST and symbol table class hierarchies to better reflect the organization of lower-level APIs.
  • 2.1.1: Adds PEP 484/PEP 561-compatible type stubs.

We also have removed or renamed quite a few features:

  • stringify is renamed string.
  • text is renamed print (cf. the command-line tool fstprint).
  • The defaults struct is removed, though it may be reintroduced as a context manager at some point.
  • The * infix operator, previously used for composition is removed; use @ instead.
  • transducer‘s arguments input_token_type and output_token_type are merged as token_type.

Finally, we have broken Python 2.7 compatibility as of 2.1.0; pywrapfst, the lower-level API, still has some degree of Python 2.7 compatibility, but this is probably the last release to maintain that property.

Leave a Reply

Your email address will not be published. Required fields are marked *