Why does tesseract fail to read text off this simple image?

2024/9/23 1:27:10

I have read mountains of posts on pytesseract, but I cannot get it to read text off a dead simple image; It returns an empty string.

Here is the image:

TestImage

I have tried scaling it, grayscaling it, and adjusting the contrast, thresholding, blurring, everything it says in other posts, but my problem is that I don't know what the OCR wants to work better. Does it want blurry text? High contrast?

Code to try:

import pytesseract
from PIL import Imageprint pytesseract.image_to_string(Image.open(IMAGE FILE))

As you can see in my code, the image is stored locally on my computer, hence Image.open()

Answer

Trying something along the lines of

import pytesseract 
from PIL import Image 
import requests 
import ioresponse = requests.get('https://i.stack.imgur.com/J2ojU.png') 
img = Image.open(io.BytesIO(response.content))
text = pytesseract.image_to_string(img, lang='eng', config='--psm 7')print(text)

with --psm values equal or larger than 6 did yield "Gm" for me.

If the image is stored locally (and in your working directory), just drop the response variable and change the definition of text with the lines

image_name = "J2ojU.png" # or whatever appropriate
text = pytesseract.image_to_string(Image.open(image_name), lang='eng', config='--psm 7')
https://en.xdnf.cn/q/71883.html

Related Q&A

python click subcommand unified error handling

In the case where there are command groups and every sub-command may raise exceptions, how can I handle them all together in one place?Given the example below:import click@click.group() def cli():pass…

Data structure for large ranges of consecutive integers?

Suppose you have a large range of consecutive integers in memory, each of which belongs to exactly one category. Two operations must be O(log n): moving a range from one category to another, and findin…

polars slower than numpy?

I was thinking about using polars in place of numpy in a parsing problem where I turn a structured text file into a character table and operate on different columns. However, it seems that polars is ab…

namespace error lxml xpath python

I am transforming word documents to xml to compare them using the following code:word = win32com.client.Dispatch(Word.Application) wd = word.Documents.Open(inFile) # Converts the word infile to xml out…

lark grammar: How does the escaped string regex work?

The lark parser predefines some common terminals, including a string. It is defined as follows:_STRING_INNER: /.*?/ _STRING_ESC_INNER: _STRING_INNER /(?<!\\)(\\\\)*?/ ESCAPED_STRING : "\&quo…

Pycharm unresolved reference on join of os.path

After upgrade pycharm to 2018.1, and upgrade python to 3.6.5, pycharm reports "unresolved reference join". The last version of pycharm doesnt show any warning for the line below:from os.path …

Apply Border To Range Of Cells Using Openpyxl

I am using python 2.7.10 and openpyxl 2.3.2 and I am a Python newbie.I am attempting to apply a border to a specified range of cells in an Excel worksheet (e.g. C3:H10). My attempt below is failing wit…

Make a functional field editable in Openerp?

How to make functional field editable in Openerp?When we createcapname: fields.function(_convert_capital, string=Display Name, type=char, store=True ),This will be displayed has read-only and we cant …

how to read a fasta file in python?

Im trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header…

Passing a pandas dataframe column to an NLTK tokenizer

I have a pandas dataframe raw_df with 2 columns, ID and sentences. I need to convert each sentence to a string. The code below produces no errors and says datatype of rule is "object." raw_d…