{"id":2199,"date":"2024-10-29T11:13:24","date_gmt":"2024-10-29T15:13:24","guid":{"rendered":"https:\/\/www.wellformedness.com\/blog\/?p=2199"},"modified":"2024-10-29T11:13:36","modified_gmt":"2024-10-29T15:13:36","slug":"announcing-udtube","status":"publish","type":"post","link":"https:\/\/www.wellformedness.com\/blog\/announcing-udtube\/","title":{"rendered":"Announcing UDTube"},"content":{"rendered":"<p>In collaboration with CUNY master&#8217;s program graduate Daniel Yakubov, we have recently open-sourced <a href=\"https:\/\/github.com\/CUNY-CL\/udtube\">UDTube<\/a>, our neural morphological analyzer. UDTube performs what is sometimes called\u00a0<em>morphological <\/em><em>analysis in context<\/em>: it provides morphological analyses\u2014coarse POS tagging, more-detailed morphosyntactic tagging, and lemmatization\u2014to whole sentences using nearby words as context.<\/p>\n<p>The UDTube model, developed in Yakubov 2024, is quite simple: it uses a pre-trained Hugging Face encoders to compute subword embeddings. We then take the last few layers of these embeddings and mean-pool them, then mean-pool subword embeddings for those words which correspond to multiple subwords. The resulting encoding of the input is then fed to separate classifier heads for the different tasks (POS tagging, etc.). During training we fine-tune the pre-trained encoder in addition to fitting the classifier heads, and we make it possible to set separate optimizers, learning rates, and schedulers for the encoder and classifier modules.<\/p>\n<p>UDTube is built atop <a href=\"https:\/\/pytorch.org\/\">PyTorch<\/a> and <a href=\"https:\/\/lightning.ai\/\">Lightning<\/a>, and its command-line interface is made much simpler by the use of <code>LightningCLI<\/code>, a module which handles most of the interface work. One can configure the entire thing using YAML configuration files. CUDA GPUs and MPS-era Macs (M1 etc.) can be used to accelerate training and inference (and should work out of the box). We also provide scripts to perform hyperparameter tuning using <a href=\"https:\/\/wandb.ai\/site\">Weights &amp; Biases<\/a>. We believe that this model, with appropriate tuning, is probably state-of-the-art for morphological analysis in context.<\/p>\n<p>UDTube is available under an Apache 2.0 license on <a href=\"https:\/\/github.com\/CUNY-CL\/udtube\">GitHub<\/a> and on <a href=\"https:\/\/pypi.org\/project\/udtube\/\">PyPI<\/a>.<\/p>\n<h1>References<\/h1>\n<p>Yakubov, D. 2024. How do we learn what we cannot say? Master&#8217;s thesis, CUNY Graduate Center.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In collaboration with CUNY master&#8217;s program graduate Daniel Yakubov, we have recently open-sourced UDTube, our neural morphological analyzer. UDTube performs what is sometimes called\u00a0morphological analysis in context: it provides morphological analyses\u2014coarse POS tagging, more-detailed morphosyntactic tagging, and lemmatization\u2014to whole sentences using nearby words as context. The UDTube model, developed in Yakubov 2024, is quite simple: &hellip; <a href=\"https:\/\/www.wellformedness.com\/blog\/announcing-udtube\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Announcing UDTube&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","footnotes":""},"categories":[3,8],"tags":[],"class_list":["post-2199","post","type-post","status-publish","format-standard","hentry","category-dev","category-python"],"_links":{"self":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/2199","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/comments?post=2199"}],"version-history":[{"count":1,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/2199\/revisions"}],"predecessor-version":[{"id":2200,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/2199\/revisions\/2200"}],"wp:attachment":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/media?parent=2199"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/categories?post=2199"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/tags?post=2199"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}