{"id":97,"date":"2023-01-10T15:41:47","date_gmt":"2023-01-10T07:41:47","guid":{"rendered":"https:\/\/mianka.xyz\/?p=96"},"modified":"2023-01-10T16:28:38","modified_gmt":"2023-01-10T08:28:38","slug":"python-tesseract","status":"publish","type":"post","link":"https:\/\/www.mianka.xyz\/?p=97","title":{"rendered":"python Tesseract \u9a8c\u8bc1\u7801\u8bc6\u522b\u8bad\u7ec3\u6d41\u7a0b"},"content":{"rendered":"<p>1\u3001\u7528jTessBoxEditor\u628a\u8981\u8bad\u7ec3\u6837\u672c\u56fe\u7247\u6587\u4ef6\u5408\u5e76\u6210tif\u6587\u4ef6(\u6837\u672c\u56fe\u7247\u4e00\u5b9a\u8981\u4e3a\u6709\u6548\u7684\u683c\u5f0f\u56fe\u7247)<\/p>\n<p>\u70b9\u51fb\u9876\u680f\u7684Tools\u9009\u9879,\u9009\u62e9Merge TIFF..&nbsp; &nbsp;\u8fdb\u5165\u4f60\u8981\u8bad\u7ec3\u7684\u6837\u672c\u56fe\u7247\u6240\u5728\u7684\u76ee\u5f55,\u70b9\u51fbCtrl+Alt+A,\u9009\u62e9\u6240\u6709\u56fe\u7247\u70b9\u51fb\u6253\u5f00\uff0c\u7136\u540e\u4fdd\u5b58\u6587\u4ef6\u540d\u5230\u6307\u5b9a\u76ee\u5f55,\u6211\u8fd9\u91cc\u4fdd\u5b58\u7684\u6587\u4ef6\u540d\u4e3a: new.tif<\/p>\n<p>2\u3001\u751f\u6210Box\u6587\u4ef6<\/p>\n<p>\u6253\u5f00cmd,\u5230\u4f60\u7684new.tif\u6587\u4ef6\u6240\u5728\u76ee\u5f55,\u6267\u884c\uff1a<\/p>\n<pre class=\"prism-highlight prism-language-bash\">tesseract&nbsp;new.tif&nbsp;new&nbsp;batch.nochop&nbsp;makebox<\/pre>\n<p>&nbsp;\u7ed3\u679c\u751f\u6210\u4e86new.box\u6587\u4ef6<\/p>\n<p>3\u3001\u5bf9\u6837\u672c\u56fe\u7247\u7528jTessBoxEditor\u5de5\u5177\u8fdb\u884c\u77eb\u6b63<\/p>\n<p>\u70b9\u51fbjTessBoxEditor\u5de5\u5177\u7684Box Editor\u9009\u9879,\u70b9\u51fb\u4e0b\u65b9\u7684open\u9009\u9879,\u6253\u5f00\u521a\u521a\u751f\u6210\u7684new.tif\u6587\u4ef6<\/p>\n<p>\u53f3\u4fa7\u4e3a\u5bf9\u5e94\u7684Box\u6587\u4ef6\u6570\u636e,\u5982\u679cchar\u7684\u5b57\u7b26\u548c\u5f53\u524d\u7684\u6837\u672c\u56fe\u7247\u4e00\u81f4\u65f6\u5c31\u8fdb\u884c\u77eb\u6b63,\u4fee\u6539char\u91cc\u7684\u5b57\u7b26,\u7136\u540e\u8fdb\u884csave,\u8fd9\u6837\u5c31\u77eb\u6b63\u4e86,\u8fdb\u5165\u4e0b\u5f20\u6837\u672c\u56fe\u7247\u65f6,\u540c\u6837,\u77eb\u6b63\u540e\u70b9\u51fbsave,\u5f53\u6240\u6709\u6837\u672c\u56fe\u7247\u90fd\u77eb\u6b63\u4e86,\u8fd9\u4e00\u6b65\u4e5f\u5c31\u5b8c\u6210\u4e86<\/p>\n<p>4\u3001\u751f\u6210font_properties\u6587\u4ef6(\u8be5\u6587\u4ef6\u6ca1\u6709\u540e\u7f00\u540d)<\/p>\n<p>\u5728\u547d\u4ee4\u884c\u6267\u884c:echo font 0 0 0 0 0 &gt;font_properties<\/p>\n<p>&nbsp;\u7ed3\u679c\u751f\u6210\u4e86font_properties\u6587\u4ef6&nbsp;<\/p>\n<p>\u5185\u5bb9\u4e3a\u5b57\u4f53\u540dfont\uff0c\u540e\u9762\u5e265\u4e2a0\uff0c\u5206\u522b\u4ee3\u8868\u5b57\u4f53\u7684\u7c97\u4f53\u3001\u659c\u4f53\u7b49\u5c5e\u6027\uff0c\u8fd9\u91cc\u5168\u90e8\u662f0<\/p>\n<p>5\u3001\u751f\u6210.tr\u8bad\u7ec3\u6587\u4ef6<\/p>\n<p>\u5728\u547d\u4ee4\u884c\u6267\u884c:&nbsp;<\/p>\n<pre class=\"prism-highlight prism-language-bash\">tesseract&nbsp;new.tif&nbsp;new&nbsp;-l&nbsp;eng&nbsp;nobatch&nbsp;box.train<\/pre>\n<p>6\u3001\u751f\u6210\u5b57\u7b26\u96c6\u6587\u4ef6<\/p>\n<p>\u5728\u547d\u4ee4\u884c\u6267\u884c :&nbsp;<\/p>\n<pre class=\"prism-highlight prism-language-bash\">unicharset_extractor&nbsp;new.box<\/pre>\n<p>\u7ed3\u679c\u751f\u6210\u4e86unicharset\u6587\u4ef6<\/p>\n<p>7\u3001\u751f\u6210shape\u6587\u4ef6<\/p>\n<p>\u5728\u547d\u4ee4\u884c\u6267\u884c :&nbsp;<\/p>\n<pre class=\"prism-highlight prism-language-bash\">shapeclustering&nbsp;-F&nbsp;font_properties&nbsp;-U&nbsp;unicharset&nbsp;-O&nbsp;langyp.unicharset&nbsp;new.tr<\/pre>\n<p>\u7ed3\u679c\u751f\u6210\u4e86shapetable\u6587\u4ef6\u548clangyp.unicharset\u6587\u4ef6<\/p>\n<p><\/p>\n<p>8\u3001\u751f\u6210\u805a\u96c6\u5b57\u7b26\u7279\u5f81\u6587\u4ef6<\/p>\n<p>\u5728\u547d\u4ee4\u884c\u6267\u884c:&nbsp;<\/p>\n<pre class=\"prism-highlight prism-language-bash\">mftraining&nbsp;-F&nbsp;font_properties&nbsp;-U&nbsp;unicharset&nbsp;-O&nbsp;langyp.unicharset&nbsp;new.tr<\/pre>\n<p>\u7ed3\u679c\u751f\u6210\u4e86pffmtable,inttemp,unicharset\u6587\u4ef6<\/p>\n<p><\/p>\n<p>9\u3001\u751f\u6210\u5b57\u7b26\u6b63\u5e38\u5316\u7279\u5f81\u6587\u4ef6<\/p>\n<p>\u5728\u547d\u4ee4\u884c\u6267\u884c:&nbsp;<\/p>\n<pre class=\"prism-highlight prism-language-bash\">cntraining&nbsp;new.tr<\/pre>\n<p>\u7ed3\u679c\u751f\u6210\u4e86normproto\u6587\u4ef6<\/p>\n<p><\/p>\n<p>10\u3001\u628a\u6b65\u9aa4\u751f\u6210\u7684\u90e8\u5206\u6587\u4ef6\u7528rename\u547d\u4ee4\u8fdb\u884c\u66f4\u540d<\/p>\n<p>\u5728\u547d\u4ee4\u884c\u6267\u884c:&nbsp;&nbsp;<\/p>\n<pre class=\"prism-highlight prism-language-bash\">rename&nbsp;normproto&nbsp;fontyp.normproto&nbsp;&nbsp;\nrename&nbsp;inttemp&nbsp;fontyp.inttemp&nbsp;&nbsp;\nrename&nbsp;pffmtable&nbsp;fontyp.pffmtable&nbsp;&nbsp;\nrename&nbsp;unicharset&nbsp;fontyp.unicharset&nbsp;&nbsp;\nrename&nbsp;shapetable&nbsp;fontyp.shapetable<\/pre>\n<p>11\u3001\u5408\u5e76\u8bad\u7ec3\u6587\u4ef6<\/p>\n<p>\u5728\u547d\u4ee4\u884c\u6267\u884c:&nbsp;<\/p>\n<pre class=\"prism-highlight prism-language-bash\">combine_tessdata&nbsp;fontyp.<\/pre>\n<p>12\u3001\u5c06fontyp.traineddata\u6587\u4ef6\u62f7\u8d1d\u81f3Tesseract-OCR\u6587\u4ef6\u5939\u91cc\u7684tessdata\u8bed\u8a00\u5305\u6587\u4ef6\u5939\u91cc<\/p>\n<p>\u4e8c\u3001Python\u9a8c\u8bc1\u7801\u8bc6\u522b\u4ee3\u7801<\/p>\n<pre class=\"prism-highlight prism-language-python\">from&nbsp;PIL&nbsp;import&nbsp;Image\nfrom&nbsp;pytesseract&nbsp;import&nbsp;pytesseract\nimage&nbsp;=&nbsp;Image.open(&#39;1.png&#39;)\ncode&nbsp;=&nbsp;pytesseract.image_to_string(image,lang=&#39;fcz_fontyp&#39;)\nprint(code)<\/pre>\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1\u3001\u7528jTessBoxEditor\u628a\u8981\u8bad\u7ec3\u6837\u672c\u56fe\u7247\u6587\u4ef6\u5408\u5e76\u6210tif\u6587\u4ef6(\u6837\u672c\u56fe\u7247\u4e00\u5b9a\u8981\u4e3a\u6709&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-97","post","type-post","status-publish","format-standard","hentry","category-pythonbiji"],"_links":{"self":[{"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=\/wp\/v2\/posts\/97","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=97"}],"version-history":[{"count":0,"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=\/wp\/v2\/posts\/97\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=97"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=97"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mianka.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=97"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}