{"id":1031,"date":"2021-04-20T16:49:01","date_gmt":"2021-04-20T16:49:01","guid":{"rendered":"http:\/\/www.wellformedness.com\/blog\/?p=1031"},"modified":"2021-11-01T16:25:16","modified_gmt":"2021-11-01T16:25:16","slug":"surprises-new-nlp-developer","status":"publish","type":"post","link":"https:\/\/www.wellformedness.com\/blog\/surprises-new-nlp-developer\/","title":{"rendered":"Surprises for the new NLP developer"},"content":{"rendered":"<p>There are a couple things that surprise students when they first begin to develop natural language processing applications.<\/p>\r\n<ul>\r\n<li>Some things just take a while. A script that, say, preprocesses millions of sentences isn&#8217;t <em>necessarily<\/em> wrong because it takes a half hour.<\/li>\r\n<li>You really do have to avoid wasting memory. If you&#8217;re processing a big file line-by-line,\r\n<ul>\r\n<li>you really can&#8217;t afford to read it all in at once, and<\/li>\r\n<li>you should write out data as soon as you can.<\/li>\r\n<\/ul>\r\n<\/li>\r\n<li>The OS and program already know how to buffer IO; don&#8217;t fight it.<\/li>\r\n<li>Whereas so much software works with data in human non-readable (e.g., wire formats, binary data) or human-hostile (XML) formats, if you&#8217;re processing text files, you can just open the files up and read them to see if they&#8217;re roughly what you expected.<\/li>\r\n<\/ul>\r\n","protected":false},"excerpt":{"rendered":"<p>There are a couple things that surprise students when they first begin to develop natural language processing applications. Some things just take a while. A script that, say, preprocesses millions of sentences isn&#8217;t necessarily wrong because it takes a half hour. You really do have to avoid wasting memory. If you&#8217;re processing a big file &hellip; <a href=\"https:\/\/www.wellformedness.com\/blog\/surprises-new-nlp-developer\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Surprises for the new NLP developer&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","footnotes":""},"categories":[3,5],"tags":[],"class_list":["post-1031","post","type-post","status-publish","format-standard","hentry","category-dev","category-nlp"],"_links":{"self":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/1031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/comments?post=1031"}],"version-history":[{"count":3,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/1031\/revisions"}],"predecessor-version":[{"id":1089,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/posts\/1031\/revisions\/1089"}],"wp:attachment":[{"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/media?parent=1031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/categories?post=1031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wellformedness.com\/blog\/wp-json\/wp\/v2\/tags?post=1031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}