Extract lined table from scanned document opencv python

2024/10/11 4:29:53

I want to extract the information from a scanned table and store it a csv. Right now my table extraction algorithm does the following steps.

  1. Apply skew correction
  2. Apply a gaussian filter for denoising.
  3. Do a binarization using Otsu thresholding
  4. Do a morphological opening.
  5. Canny egde detection
  6. Do a hough transform to obtain lines of table.
  7. Remove duplicate lines( same lines in the range of 10 pixels)
  8. filter the horizontal and vertical lines using slope of line(slope should be less than +/-5 degree for horizontal and normal of verticals).

This algorithm is working fine for digital born pdfs and most of the scanned documents. But, Some of the documents have a noisy table and thus its not identifying the lines correctly.

Here is a sample image in which my algorithm fails.

raw image

These are the operations I am doing on this table. 1.Gaussian blur

Gaussian blur

2.Otsu thresholding

Otsu thresholding

3.Morphological opening

Morphological opening

4.Canny edge detection

Canny edge detection

5.filtered lines,as you can see the lines are clearly not identified correctly.

filtered lines,as you can see the lines are clearly not identified correctly.

Can anyone please suggest better method for extracting horizontal and vertical lines from this kind of less quality scans.

Thanks in advance!!

Answer

I found a perfect solution in this blog. https://medium.com/coinmonks/a-box-detection-algorithm-for-any-image-containing-boxes-756c15d7ed26

Here,We are doing morphological transformations using a vertical kernel to detect vetical lines and horizontal kernel to detect horizontal lines and then combining them to get all the required lines.

Vertical lines Vertical lines

Horizontal lines Horizontal lines

required output required output

https://en.xdnf.cn/q/69815.html

Related Q&A

Nested Python C Extensions/Modules?

How do I compile a C-Python module such that it is local to another? E.g. if I have a module named "bar" and another module named "mymodule", how do I compile "bar" so th…

ImportError: No module named sysconfig--cant get pip working

Im really struggling with pip on a RedHat 6.9 system. Every time I tried to use pip, I got ImportError: No module named sysconfigI tried Googling for solutions. I dont have apt-get and cant seem to get…

Convert Dataframe to a Dictionary with List Values

Suppose I have a Dataframe df :Label1 Label2 Label3 key1 col1value1 col2value1 key2 col1value2 col2value2 key3 col1value3 col2value3dict1 = df.set_index(Label1).to_dic…

Efficiently count all the combinations of numbers having a sum close to 0

I have following pandas dataframe df column1 column2 list_numbers sublist_column x y [10,-6,1,-4] a b [1,3,7,-2] p q [6,2,-3,-3.…

What is the equivalent to iloc for dask dataframe?

I have a situation where I need to index a dask dataframe by location. I see that there is not an .iloc method available. Is there an alternative? Or am I required to use label-based indexing?For …

How to deal with limitations of inspect.getsource - or how to get ONLY the source of a function?

I have been playing with the inspect module from Pythons standard library. The following examples work just fine (assuming that inspect has been imported):def foo(x, y):return x - y print(inspect.getso…

Checking whether a function is decorated

I am trying to build a control structure in a class method that takes a function as input and has different behaviors if a function is decorated or not. Any ideas on how you would go about building a f…

How to keep the script run after plt.show() [duplicate]

This question already has answers here:Is there a way to detach matplotlib plots so that the computation can continue?(21 answers)Closed 6 years ago.After the plt.show() , I just want to continue. How…

python - Simulating else in dictionary switch statements

Im working on a project which used a load of If, Elif, Elif, ...Else structures, which I later changed for switch-like statements, as shown here and here.How would I go about adding a general "Hey…

Allow dynamic choice in Django ChoiceField

Im using Select2 in my application for creating tags-like select dropdowns. Users can select number of predefined tags or create a new tag.Relevant forms class part:all_tags = Tag.objects.values_list(i…