Convert PDF to Excel [closed]

2024/7/4 15:18:48

How to convert the table which is inside the pdf to excel .

I have tried some online tools but it was giving 60% result.

The sample table which contains in my pdf is given below. enter image description here I have hidden the field which contains name filed.

Answer

Getting data out from a pdf file is pretty messy. If the pdf table is ordered and has got a unique pattern embedded along with it, the best way to get the data is by converting the pdf to xml. For this you can use: pdftohtml.

Installation: sudo apt-get install pdftohtml

Usage: pdftohtml -xml *Your File.pdf* *Output File.xml*

You can run this command directly in the terminal.

The xml file which you will get now will have tags just like html which you can use to get the data from the generated xml output.

PS: One thing to be noted if the pdf table is not ordered then it becomes very difficult to get the data out from that xml because the tags will have some attributes which will not match the pattern. In that case you will need to hard code things.

https://en.xdnf.cn/q/120448.html

Related Q&A

Return the furthermost outlier in kmeans clustering? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

Highest number of consecutively repeating values in a list

Lets say I have this list:List= [1,1,1,0,0,1,1,1,1,1]How do I display the highest number of repeating 1s in a row? I want to return 5.

Python Maze Generation

I am trying to make a python maze generator but I keep getting an IndexError: list index out of range. Any ideas? Im kinda new to this stuff so I was using the code from rosetta code on maze generatio…

Why does Pythons `any` function not return True or False? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

Write a list of lists into csv in Python [duplicate]

This question already has answers here:Writing a Python list of lists to a csv file(12 answers)Closed 5 years ago.I have a list of lists of pure data like thisa=[[1,2,3],[4,5,6],[7,8,9]]How can I write…

Upper limit of points pyplot

Just being a curious George here. But Im handling 279 million data points (x,y) and am wondering if pyplot can scatter such a number?Thanks.

how to delete the u and before the of database table display by python [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…

python list permutations [duplicate]

This question already has answers here:Closed 12 years ago.Possible Duplicate:How to generate all permutations of a list in Python I am given a list [1,2,3] and the task is to create all the possible …

Repeating Characters in the Middle of a String

Here is the problem I am trying to solve but having trouble solving:Define a function called repeat_middle which receives as parameter one string (with at least one character), and it should return a n…

Remove a big list of of special characters [duplicate]

This question already has answers here:Remove specific characters from a string in Python(27 answers)Closed 5 years ago.I want to remove each of the following special characters from my documents: symb…