While processing a PDF file (2.pdf) with pdfminer (pdf2txt.py) I received the following error:
pdf2txt.py 2.pdf Traceback (most recent call last):File "/usr/local/bin/pdf2txt.py", line 115, in <module>if __name__ == '__main__': sys.exit(main(sys.argv))File "/usr/local/bin/pdf2txt.py", line 109, in maininterpreter.process_page(page)File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in process_pageself.render_contents(page.resources, page.contents, ctm=ctm)File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 843, in render_contentsself.init_resources(resources)File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 347, in init_resourcesself.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 195, in get_fontfont = self.get_font(None, subspec)File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 186, in get_fontfont = PDFCIDFont(self, spec)File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 654, in __init__StringIO(self.fontfile.get_data()))File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 375, in __init__(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
struct.error: unpack requires a string argument of length 16
While the similar file (1.pdf) doesn't cause a problem.
I can't find any information about the error. I added an issue on the pdfminer GitHub repository, but it remained unanswered. Can someone explain to me why this is happening? What can I do to parse 2.pdf?
Update: I get a similar error with BytesIO
instead of StringIO
after installing pdfminer directly from the GitHub repository.
$ pdf2txt.py 2.pdf
Traceback (most recent call last):File "/home/danil/projects/python/pdfminer-source/env/bin/pdf2txt.py", line 116, in <module>if __name__ == '__main__': sys.exit(main(sys.argv))File "/home/danil/projects/python/pdfminer-source/env/bin/pdf2txt.py", line 110, in maininterpreter.process_page(page)File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 839, in process_pageself.render_contents(page.resources, page.contents, ctm=ctm)File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 850, in render_contentsself.init_resources(resources)File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 356, in init_resourcesself.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 204, in get_fontfont = self.get_font(None, subspec)File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 195, in get_fontfont = PDFCIDFont(self, spec)File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 665, in __init__BytesIO(self.fontfile.get_data()))File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 386, in __init__(name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16))
struct.error: unpack requires a string argument of length 16