{"id":920,"date":"2020-04-06T15:54:48","date_gmt":"2020-04-06T15:54:48","guid":{"rendered":"http:\/\/www.wellformedness.com\/blog\/?p=920"},"modified":"2023-01-24T01:58:16","modified_gmt":"2023-01-24T01:58:16","slug":"optimizing-three-way-composition-for-decipherment-problems","status":"publish","type":"post","link":"https:\/\/www.wellformedness.com\/blog\/optimizing-three-way-composition-for-decipherment-problems\/","title":{"rendered":"Optimizing three-way composition for decipherment problems"},"content":{"rendered":"<p>Knight et al. (2006) introduce a class of problems they call\u00a0<em>decipherment.\u00a0<\/em>In this scenario, we observe a &#8220;ciphertext&#8221;\u00a0<em>C<\/em> , which we wish to decode. We imagine that there exists a corpus of &#8220;plaintext&#8221; <em>P,<\/em> and which to recover the encipherment model <em>G<\/em> that transduces from <em>P<\/em> to <em>C<\/em>. All three components can be represented as (weighted) finite-state transducers:\u00a0<em>P<\/em> is a language model over plaintexts,\u00a0<em>C<\/em> is a list of strings, and\u00a0<em>G<\/em> is an initially-uniform transducer from\u00a0<em>P<\/em> to <em>C.\u00a0<\/em>We can then estimate the parameters (i.e.. arc weights) of <em>G<\/em> by holding\u00a0<em>P<\/em> and\u00a0<em>C<\/em> constant and applying the expectation maximization algorithm (Dempster et al. 1977).<\/p>\n<p>Both training and prediction require us to repeatedly compute the &#8220;cascade&#8221;, the three-way composition <em>P <\/em><em>\u25cb G \u25cb C<\/em>. First off, two-way composition is\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Associative_property\"><em>associative<\/em><\/a>, so for all\u00a0<em>a<\/em>,\u00a0<em>b,<\/em>\u00a0<em>c<\/em> : (<em>a<\/em> \u25cb <em>b<\/em>) \u25cb <em>c<\/em> = <em>a<\/em> \u25cb (<em>b<\/em> \u25cb c). However, given any <em>n<\/em>-way composition, some associations may be radically more efficient than others. Even were the time complexity of each possible composition known, it is still<a href=\"https:\/\/en.wikipedia.org\/wiki\/Matrix_chain_multiplication\"> not trivial to compute the optimal association<\/a>. Fortunately, in this case we are dealing with three-way composition, for which there are only two possible associations; we simply need to compare the two.<sup>1<\/sup><\/p>\n<p>Composition performance depends on the sorting properties of the relevant machines. In the simplest case, the inner loop of (two-way) composition consists of a complex game of &#8220;go fish&#8221; between a state in the left-hand side automaton and a state in the right-hand side automaton. One state enumerates over its input (respectively, output) labels and queries the other state&#8217;s output (respectively input) labels. In the case that the state in the automaton being queried has its arcs sorted according to the label values, a sublinear <a href=\"https:\/\/en.wikipedia.org\/wiki\/Binary_search_algorithm\">binary search <\/a>is used; otherwise, linear-time search is required. Optimal performance obtains when the left-hand side of composition is sorted by output labels and the right-hand side is sorted by input labels.<sup>2<\/sup> Naturally, we also want to perform arc-sorting offline if possible.<\/p>\n<p>Finally, <a href=\"http:\/\/openfst.org\">OpenFst<\/a>, the finite-state library we use, implements composition as an\u00a0<em>on-the-fly<\/em> operation: states in the composed FST are lazily computed and stored in an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cache_replacement_policies#Least_Recently_Used_(LRU)\">LRU cache<\/a>.<sup>3<\/sup> Assiduous use of the cache can make it feasible to compute very large compositions when it is not necessary to visit all state of the composed machine. Today I focus on associativity and assume optimal label sorting; caching will have to wait for another day.<\/p>\n<p>Our cascade consists of three weighted finite-state machines:<\/p>\n<ul>\n<li><em>P<\/em> is a language model expressed as a weighted label-sorted finite-state acceptor. The model is order 6, with Witten-Bell smoothing (Bell et al. 1990) backoffs encoded using \u03c6 (i.e., &#8220;failure&#8221;) transitions, and has been shrunk to 1 million n-grams using relative entropy pruning (Stolcke 1998).<\/li>\n<li><em>G<\/em> is a uniform channel model encoded as a finite-state transducer. Because it is a non-deterministic transducer, it can be input-label-sorted or output-label sorted, but not both.<\/li>\n<li><em>C<\/em> is an unweighted label-sorted string finite-state acceptor encoding a long plaintext.<\/li>\n<\/ul>\n<p>There are two possible associativities, which we illustrate using the OpenFst\u00a0<a href=\"http:\/\/python.openfst.org\">Python bindings<\/a>.<sup>4 <\/sup>In the first, we use a left-associative composition. Offline, before composition, we input label-sort <em>G<\/em>:<\/p>\n<pre>In [5]: G.arcsort(\"ilabel\")<\/pre>\n<p>Then, we perform both compositions, sorting the intermediate object by output label:<\/p>\n<pre>In [6]: %timeit -n 10 \r\n...          partial = compose(P, G, connect=False).arcsort(\"olabel\"); \r\n...          cascade = compose(partial, C, connect=False)\r\n10 loops, best of 3: 41.6 s per loop<\/pre>\n<p>In our second design, we use the parallel right-associative construction. Offline, we output label-sort G<em>:<\/em><\/p>\n<pre>In [7]: G.arcsort(\"olabel\")<\/pre>\n<p>Then, we perform both compositions, sorting the intermediate object by input label:<\/p>\n<pre>In [8]: %timeit -n 10 \r\n...          partial = compose(G, C, connect=False).arcsort(\"ilabel\"); \r\n...          cascade = compose(P, partial, connect=False)\r\n3 loops, best of 3: 38.5 s per loop<\/pre>\n<p>So we see a small advantage for the right-associative composition, which we take advantage of in OpenGrm-BaumWelch, freely\u00a0available <a href=\"http:\/\/baumwelch.opengrm.org\">from the OpenGrm website<\/a>.<\/p>\n<h1>Endnotes<\/h1>\n<ol>\n<li>There exist FST algorithms for n-ary composition (Allauzen &amp; Mohri 2009), but in practice one can achieve similar effects using <a href=\"http:\/\/www.openfst.org\/twiki\/bin\/view\/FST\/FstAdvancedUsage#Composition%20Filters\">composition filters<\/a> (Allauzen et al. 2010) instead.<\/li>\n<li>Note that acceptors which are input label-sorted are implicitly output label-sorted and vice versa, and string FSTs are input and output label-sorted by definition.<\/li>\n<li>In the case where one needs the entire composition at once, we can simply disable caching; in OpenFst, the result is also <a href=\"http:\/\/www.openfst.org\/twiki\/bin\/view\/FST\/ConnectDoc\">connected<\/a> (i.e., trimmed) by default, but we disable that since we need to track the original state IDs.<\/li>\n<li>The <a href=\"https:\/\/docs.python.org\/2\/library\/timeit.html\"><code>timeit<\/code><\/a> module is used to estimate execution times irrespective of caching.<\/li>\n<\/ol>\n<h1>References<\/h1>\n<p>Allauzen, C., and Mohri, M.. 2009. N-way composition of weighted finite-state transducers. <i>International Journal of Foundations of Computer Science<\/i> 20(4): 613-627.<br \/>\nAllauzen, C., Riley, M. and Schalkwyk, J. 2010. Filters for efficient composition of weighted finite-state transducers. In <i>Implementation and Application of Automata: 15th International Conference, CIAA 2010<\/i>, pages 28-38. Manitoba.<br \/>\nBell, T.C., Clearly, J. G., and Witten, I.H. 1990.\u00a0<em>Text Compression<\/em>. Englewood Cliffs, NJ: Prentice Hall.<br \/>\nDempster, A. P., Laird, N., M, and Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. <em>Journal of the Royal Statistical Society, Series B <\/em> 39(1): 1-38.<br \/>\nKnight, K., Nair, A., Rathod, N, Yamada, K. 2006. Unsupervised analysis for decipherment problems. In <i>Proceedings of the COLING\/ACL 2006 Main Conference Poster Sessions<\/i>, pages 499-506. Sydney.<br \/>\nStolcke, A. 1998. Entropy-based pruning of backoff language models. In <em>Proceedings of the DARPA Broadcast News And Understanding Workshop<\/em>, pages 270\u2013274. Lansdowne, Virginia.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Knight et al. (2006) introduce a class of problems they call\u00a0decipherment.\u00a0In this scenario, we observe a &#8220;ciphertext&#8221;\u00a0C , which we wish to decode. We imagine that there exists a corpus of &#8220;plaintext&#8221; P, and which to recover the encipherment model G that transduces from P to C. All three components can be represented as (weighted) &hellip; <a href=\"https:\/\/www.wellformedness.com\/blog\/optimizing-three-way-composition-for-decipherment-problems\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Optimizing three-way composition for decipherment problems&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","footnotes":""},"categories":[3,4,5,8],"tags":[],"class_list":["post-920","post","type-post","status-publish","format-standard","hentry","category-dev","category-language","category-nlp","category-python"],"_links":{"self":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/920","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/comments?post=920"}],"version-history":[{"count":10,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/920\/revisions"}],"predecessor-version":[{"id":1657,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/920\/revisions\/1657"}],"wp:attachment":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/media?parent=920"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/categories?post=920"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/tags?post=920"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}