Verifying the integrity of PyPI Python packages

2024/10/2 16:28:09

Recently there came some news about some Malicious Libraries that were uploaded into Python Package Index (PyPI), see:

  1. Malicious libraries on PyPI
  2. Malicious modules found into official Python repository (this link contains the list of malicious packages)
  3. Developers using malicious Python Modules

I am not trying to forward these news but I am trying to prevent myself and other teammates to identify if a package from PyPI has not been altered by an external party.

Questions:

  1. What security check should I use once I have downloaded a package from PyPI? MD5 or any extra step?
  2. Is MD5 signature enough to verify the integrity of Python Packages?
Answer

First, the article describes the danger of typosquatting, which is caused by developers blindly installing package by name without checking if it's the correct upstream package. You can avoid this by going to the author's GitHub repository and copy the install instructions correctly.

Aside from that, packages can be tampered but unlikely. As the PyPI files are transferred through HTTPS, it doesn't make much sense to fetch a hash from server and verify it. (If the author's account or the PyPI server is hacked, hash doesn't prevent you from installing malicious packages.)

If you need extra security measure against server compromise, use pinned version/hashes. See the documentation for details.

https://en.xdnf.cn/q/70832.html

Related Q&A

How to get results from custom loss function in Keras?

I want to implement a custom loss function in Python and It should work like this pseudocode:aux = | Real - Prediction | / Prediction errors = [] if aux <= 0.1:errors.append(0) elif aux > 0.1 &am…

How to tell whether a file is executable on Windows in Python?

Im writing grepath utility that finds executables in %PATH% that match a pattern. I need to define whether given filename in the path is executable (emphasis is on command line scripts).Based on "…

Issue with python/pytz Converting from local timezone to UTC then back

I have a requirement to convert a date from a local time stamp to UTC then back to the local time stamp.Strangely, when converting back to the local from UTC python decides it is PDT instead of the or…

Regex to replace %variables%

Ive been yanking clumps of hair out for 30 minutes doing this one...I have a dictionary, like so:{search: replace,foo: bar}And a string like this:Foo bar %foo% % search %.Id like to replace each var…

Python kivy - how to reduce height of TextInput

I am using kivy to make a very simple gui for an application. Nothing complex, very simple layout.Nevertheless I am having a hard time with TextInputs...They always display with full height and I cant …

Python-Matplotlib boxplot. How to show percentiles 0,10,25,50,75,90 and 100?

I would like to plot an EPSgram (see below) using Python and Matplotlib. The boxplot function only plots quartiles (0, 25, 50, 75, 100). So, how can I add two more boxes?

Python Pandas reads_csv skip first x and last y rows

I think I may be missing something obvious here, but I am new to python and pandas. I am reading a large text file and only want to use rows in range(61,75496). I can skip the first 60 rows withkeyword…

Combine two arrays data using inner join

Ive two data sets in array: arr1 = [[2011-10-10, 1, 1],[2007-08-09, 5, 3],... ]arr2 = [[2011-10-10, 3, 4],[2007-09-05, 1, 1],... ]I want to combine them into one array like this: arr3 = [[2011-10-10, 1…

How to change fontsize of individual legend entries in pyplot?

What Im trying to do is control the fontsize of individual entries in a legend in pyplot. That is, I want the first entry to be one size, and the second entry to be another. This was my attempt at a so…

Split array into equal sized windows [duplicate]

This question already has answers here:Sliding window of M-by-N shape numpy.ndarray(8 answers)Closed 10 months ago.I am trying to split an numpy.array of length 40 into smaller, equal-sized numpy.array…