admin管理员组文章数量:1391995
I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.
- Already installed tesseract and tesseract-OCR.
- Already Tries many times assuming that sometimes it don't work.
I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.
- Already installed tesseract and tesseract-OCR.
- Already Tries many times assuming that sometimes it don't work.
1 Answer
Reset to default 2tesseract
sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc
If I resize your image 200%
then tesseract can get text.
I used external program ImageMagick for this but you may use python module pillow
(or Wand which also uses Imagemagick
)
$ convert captcha.png -scale 200% captcha-200p.png
Command file
can show some information about files
$ file ca*
captcha-200p.png: PNG image data, 300 x 60, 8-bit grayscale, non-interlaced
captcha.png: PNG image data, 150 x 30, 8-bit/color RGBA, non-interlaced
Strange is that you don't get any error message because when I run tesseract only with input image then it shows message how to use it
$ tesseract captcha-200p.png
Usage:
tesseract --help | --help-extra | --version
tesseract --list-langs
tesseract imagename outputbase [options...] [configfile...]
OCR options:
-l LANG[+LANG] Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.
Single options:
--help Show this help message.
--help-extra Show extra help for advanced users.
--version Show version information.
--list-langs List available languages for tesseract engine.
It needs output name without extension (and it adds .txt
) to write result in file
$ tesseract captcha-200p.png output
Estimating resolution as 308
$ cat ouput.txt
81+20=?
or it needs -
to set ouput to stdout and show it on screen or redirect to other program
$ tesseract captcha-200p.png -
Estimating resolution as 308
81+20=?
Tested on: Linux Mint 22 (based on Ubuntu 24.02), tesseract 5.3.4 (leptonica-1.82.0)
本文标签: tesseractPytesseract not recognize text from image in PythonStack Overflow
版权声明:本文标题:tesseract - Pytesseract not recognize text from image in Python - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744718394a2621547.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
tesseract
sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc – furas Commented Mar 13 at 12:01selenium
– furas Commented Mar 13 at 13:38