I have thousands of pdf file that I need to extract data from.This is an example pdf. I want to extract this information from the example pdf.
I am open to nodejs, python or any other effective method. I have little knowledge in python and nodejs. I attempted using python with this code
import PyPDF2
try:
pdfFileObj = open('test.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageNumber = pdfReader.numPages
page = pdfReader.getPage(0)
print(pageNumber)
pagecontent = page.extractText()
print(pagecontent)
except Exception as e:
print(e)
but I got stuck on how to find the procurement history. What is the best way to extract the procurement history from the pdf?