Extracting particular text associated value from an image

2024/11/12 5:09:17

I have an image, and from the image I want to extract key and value pair details.

As an example, I want to extract the value of "MASTER-AIRWAYBILL NO:"

Image

I have written to extract the entire text from the image using python opencv and OCR, but I don't have any clue how to extract only the value for "MASTER-AIRWAYBILL NO:" from the entire result text of the image.

Please find the code:

import cv2
import numpy as np
import pytesseract
from PIL import Image
print ("Hello")
src_path = "C:\\Users\Venkatraman.R\Desktop\\alpha_bill.jpg"
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"print (src_path)# Read image with opencv
img = cv2.imread(src_path)# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)# Write image after removed noise
cv2.imwrite(src_path + "removed_noise.png", img)#  Apply threshold to get image with only black and white
#img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres.png", img)# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))# Remove template file
#os.remove(temp)print ('--- Start recognize text from image ---')
print (result)

So output should be like:

MASTER-AIRWAYBILL NO: 157-46637194

Answer

You can use pytesseract image_to_string() and a regex to extract the desired text, i.e.:

from PIL import Image
import pytesseract, re
f = "ocr.jpg"
t = pytesseract.image_to_string(Image.open(f))
m = re.findall(r"MASTER-AIRWAYBILL NO: [\d—-]+", t)
if m:print(m[0])

Output:

MASTER-AIRWAYBILL NO: 157—46637194
https://en.xdnf.cn/q/71836.html

Related Q&A

Installing pip in Pycharm 2016.3

I upgraded to the new version of Pycharm. In the terminal, it says bash-3.2$ instead of my username. When I tried to install a library, it said that pip command is not found:bash: pip: command not foun…

How to store real-time chat messages in database?

I am using mysqldb for my database currently, and I need to integrate a messaging feature that is in real-time. The chat demo that Tornado provides does not implement a database, (whereas the blog does…

Selectively import from another Jupyter Notebook

I arranged my Jupyter notebooks into: data.ipynb, methods.ipynb and results.ipynb. How can I selectively import cells from data and methods notebooks for use in the results notebook?I know of nbimport…

supervisord event listener

Im trying to configure an event listener for supervisord but cant get it to work. I just want to listen for PROCESS_STATE changes and run some python code triggering an urllib2request.In my .conf I hav…

Integration of Java and Python Code in One Eclipse Project

I am writing a compiler in Python using Eclipse with PyDev. Ive come to a stage where I needed to write some code in Java. Im wandering if there is a way of combining these into a single project, bec…

formatting of timestamp on x-axis

Im trying to format the x-axis in my weather data plot. Im happy with the y-axis but all my tries to get the x-axis into a decent, human-readable format didnt work so far. So after several hours of tri…

How can I set the row height in Tkinter TreeView?

I wrote a small app recently that needs to be cross-platform. I used Python and Tkinter for the GUI.It works great but recently I got a new laptop with a hiDPI screen and it seems to mess up the TreeVi…

Is replace row-wise and will overwrite the value within the dict twice?

Assuming I have following data set lst = [u, v, w, x, y] lst_rev = list(reversed(lst)) dct = dict(zip(lst, lst_rev))df = pd.DataFrame({A:[a, b, a, c, a],B:lst},dtype=category)Now I want to replace the …

Python requests, how to add content-type to multipart/form-data request

I Use python requests to upload a file with PUT method.The remote API Accept any request only if the body contains an attribute Content-Type:i mage/png not as Request Header When i use python requests…

Django Year/Month based posts archive

im new to Django and started an application, i did the models, views, templates, but i want to add some kind of archive to the bottom of the page, something like this http://www.flickr.com/photos/ion…