How to convert the table which is inside the pdf to excel .
I have tried some online tools but it was giving 60% result.
The sample table which contains in my pdf is given below. I have hidden the field which contains name filed.
How to convert the table which is inside the pdf to excel .
I have tried some online tools but it was giving 60% result.
The sample table which contains in my pdf is given below. I have hidden the field which contains name filed.
Getting data out from a pdf file is pretty messy. If the pdf table is ordered and has got a unique pattern embedded along with it, the best way to get the data is by converting the pdf to xml. For this you can use: pdftohtml.
Installation: sudo apt-get install pdftohtml
Usage: pdftohtml -xml *Your File.pdf* *Output File.xml*
You can run this command directly in the terminal.
The xml file which you will get now will have tags just like html which you can use to get the data from the generated xml output.
PS: One thing to be noted if the pdf table is not ordered then it becomes very difficult to get the data out from that xml because the tags will have some attributes which will not match the pattern. In that case you will need to hard code things.