UnrecognizedImageError - image insertion error - python-docx

2024/10/15 1:25:05

I am trying to insert an wmf file to docx using python-docx which is producing the following traceback.

Traceback (most recent call last):File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 79, in <module>read_ppt(path, file)File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 73, in read_pptwrite_docx(ppt_data, False)File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 31, in write_docxdocument.add_picture(slide_data.get('picture_location'), width=Inches(5.0))File "C:\Python34\lib\site-packages\docx\document.py", line 72, in add_picturereturn run.add_picture(image_path_or_stream, width, height)File "C:\Python34\lib\site-packages\docx\text\run.py", line 62, in add_pictureinline = self.part.new_pic_inline(image_path_or_stream, width, height)File "C:\Python34\lib\site-packages\docx\parts\story.py", line 56, in new_pic_inlinerId, image = self.get_or_add_image(image_descriptor)File "C:\Python34\lib\site-packages\docx\parts\story.py", line 29, in get_or_add_imageimage_part = self._package.get_or_add_image_part(image_descriptor)File "C:\Python34\lib\site-packages\docx\package.py", line 31, in get_or_add_image_partreturn self.image_parts.get_or_add_image_part(image_descriptor)File "C:\Python34\lib\site-packages\docx\package.py", line 74, in get_or_add_image_partimage = Image.from_file(image_descriptor)File "C:\Python34\lib\site-packages\docx\image\image.py", line 55, in from_filereturn cls._from_stream(stream, blob, filename)File "C:\Python34\lib\site-packages\docx\image\image.py", line 176, in _from_streamimage_header = _ImageHeaderFactory(stream)File "C:\Python34\lib\site-packages\docx\image\image.py", line 199, in _ImageHeaderFactoryraise UnrecognizedImageError
docx.image.exceptions.UnrecognizedImageError

The image file is in .wmf format.

Any help or suggestion appreciated.

Answer

python-docx identifies the type of an image-file by "recognizing" its distinctive header. In this way it can distinguish JPEG from PNG, from TIFF, etc. This is much more reliable than mapping a filename extension and much more convenient than requiring the user to tell you the type. It's a pretty common approach.

This error indicates python-docx is not finding a header it recognizes. Windows Metafile format (WMF) can be tricky this way, there's a lot of leeway in the proprietary spec and variation in file specimens in the field.

To fix this, I recommend you read the file with something that does recognize it (I would start with Pillow) and have it "convert" it into the same or another format, hopefully correcting the header in the process.

First I would try just reading it and saving it as WMF (or perhaps EMF if that's an option). This might be enough to do the trick. If you have to change to an intermediate format and then back, that could be lossy, but maybe better than nothing.

ImageMagick might be another good choice to try because it probably has better coverage than Pillow does.

https://en.xdnf.cn/q/69344.html

Related Q&A

Python pool map and choosing number of processes

In setting the number of processes, Id be keen to see how many threads I can actually use on my machine - how do I find this? Is there a way to determine the number of threads available to me?

connection times out when trying to connect to mongodb atlas with python

Im trying to connect to my mongodb atlas cluster but i keep getting timed out as soon as i try to do something with my db. The db i use was created in mongoshell and also the collection i checked their…

Supervisor not working with Gunicorn + Flask

I am trying to run Gunicorn from Supervisor in an Ubuntu 12.04 system. Gunicorn runs a Flask app (simple REST web service tested with Flasks embedded server). I have installed Gunicorn by clonning GIT …

How to hash int/long using hashlib in Python?

Im developing a set of cryptographic algorithms / protocols for educational purposes. Specifically, I am currently working on OAEP encoding.OAEP involves use of cryptographic hash functions; therefore …

SQLAlchemy: Override relationship-defined order_by in a query

So, I have a model that is something like:class Foo(model):__tablename__ = "foo"id = Column(Integer, primary_key=True)data = relationship("FooData",cascade="all, delete-orphan&…

Toplevel in Tkinter: Prevent Two Windows from Opening

Say I have some simple code, like this:from Tkinter import * root = Tk() app = Toplevel(root) app.mainloop()This opens two windows: the Toplevel(root) window and the Tk() window. Is it possible to avoi…

Specify File path in tkinter File dialog

I have a file dialog to open a file, however, the file that I want to open is in a different directory than the program I wrote. The file dialog opens to the directory where I am. Is there a way to s…

Why does scipy linear interpolation run faster than nearest neighbor interpolation?

Ive written a routine that interpolates point data onto a regular grid. However, I find that scipys implementation of nearest neighbor interpolation performs almost twice as slow as the radial basis f…

How do I create a 404 page?

My application catches all url requests with an @app.route, but occasionally I bump into a bad url for which I have no matching jinja file (bu it does match an existing @app.route). So I want to redire…

Injecting pre-trained word2vec vectors into TensorFlow seq2seq

I was trying to inject pretrained word2vec vectors into existing tensorflow seq2seq model.Following this answer, I produced the following code. But it doesnt seem to improve performance as it should, a…