Read EXE, MSI, and ZIP file metadata in Python in Linux

2024/10/2 10:37:30

I am writing a Python script to index a large set of Windows installers into a DB.

I would like top know how to read the metadata information (Company, Product Name, Version, etc) from EXE, MSI and ZIP files using Python running on Linux.

Software

I am using Python 2.6.5 on Ubuntu 10.04 64-bit with Django 1.2.1.

Found so far:

Windows command line utilities that can extract EXE metadata (like filever from SysUtils), or other individual CL utils that only work in Windows. I've tried running these through Wine but they have problems and it hasn't been worth the work to go and find the libs and frameworks that those CL utils depend on and try installing them in Wine/Crossover.

Win32 modules for Python that can do some things but won't run in Linux (right?)

Secondary question:

Obviously changing the file's metadata would change the MD5 hashsum of the file. Is there a general method of hashing a file independent of the metadata besides locating it and reading it in (ex: like skipping the first 1024 byes?)

Answer

Take a look at this library: http://bitbucket.org/haypo/hachoir/wiki/Home and this example program that uses the library: http://pypi.python.org/pypi/hachoir-metadata/1.3.3. The second link is an example program which uses the Hachoir binary file manipulation library (first link) to parse the metadata.

The library can handle these formats:

  • Archives: bzip2, gzip, zip, tar
  • Audio: MPEG audio ("MP3"), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG), MIDI, AIFF, AIFC, Real audio (RA)
  • Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF
  • Misc: Torrent
  • Program: EXE
  • Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime (MOV), Ogg/Theora, Real media (RM)

Additionally, Hachoir can do some file manipulation operations which I would assume includes some primitive metadata manipulation.

https://en.xdnf.cn/q/70867.html

Related Q&A

IllegalArgumentException thrown when count and collect function in spark

I tried to load a small dataset on local Spark when this exception is thrown when I used count() in PySpark (take() seems working). I tried to search about this issue but got no luck in figuring out wh…

Check if string does not contain strings from the list

I have the following code: mystring = ["reddit", "google"] mylist = ["a", "b", "c", "d"] for mystr in mystring:if any(x not in mystr for x in…

How do I conditionally include a file in a Sphinx toctree? [duplicate]

This question already has answers here:Conditional toctree in Sphinx(4 answers)Closed 8 years ago.I would like to include one of my files in my Sphinx TOC only when a certain tag is set, however the ob…

Use BeautifulSoup to extract sibling nodes between two nodes

Ive got a document like this:<p class="top">I dont want this</p><p>I want this</p> <table><!-- ... --> </table><img ... /><p> and all tha…

Put HTML into ValidationError in Django

I want to put an anchor tag into this ValidationError:Customer.objects.get(email=value)if self.register:# this address is already registeredraise forms.ValidationError(_(An account already exists for t…

python os.listdir doesnt show all files

In my windows7 64bit system, there is a file named msconfig.exe in folder c:/windows/system32. Yes, it must exists.But when i use os.listdir to search the folder c:/windows/system32, I didnt get the fi…

how to save modified ELF by pyelftools

Recently Ive been interested in ELF File Structure. Searching on web, I found an awesome script named pyelftools. But in fact I didnt know the way to save the modified ELF; ELFFile class doesnt have an…

Access train and evaluation error in xgboost

I started using python xgboost backage. Is there a way to get training and validation errors at each training epoch? I cant find one in the documentation Have trained a simple model and got output:[09…

Gtk* backend requires pygtk to be installed

From within a virtual environment, trying to load a script which uses matplotlibs GTKAgg backend, I fail with the following traceback:Traceback (most recent call last):File "<stdin>", l…

ValueError: A value in x_new is below the interpolation range

This is a scikit-learn error that I get when I domy_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)Note that if I decrease max_n_alphas from 1e5 down to 1…