PyPI is slow. How do I run my own server?

2024/11/18 23:50:55

When a new developer joins the team, or Jenkins runs a complete build, I need to create a fresh virtualenv. I often find that setting up a virtualenv with Pip and a large number (more than 10) of requirements takes a very long time to install everything from PyPI. Often it fails altogether with:

Downloading/unpacking Django==1.4.5 (from -r requirements.pip (line 1))
Exception:
Traceback (most recent call last):File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/basecommand.py", line 107, in mainstatus = self.run(options, args)File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/commands/install.py", line 256, in runrequirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/req.py", line 1018, in prepare_filesself.unpack_url(url, location, self.is_download)File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/req.py", line 1142, in unpack_urlretval = unpack_http_url(link, location, self.download_cache, self.download_dir)File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/download.py", line 463, in unpack_http_urldownload_hash = _download_url(resp, link, temp_location)File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/download.py", line 380, in _download_urlchunk = resp.read(4096)File "/usr/lib64/python2.6/socket.py", line 353, in readdata = self._sock.recv(left)File "/usr/lib64/python2.6/httplib.py", line 538, in reads = self.fp.read(amt)File "/usr/lib64/python2.6/socket.py", line 353, in readdata = self._sock.recv(left)
timeout: timed out

I'm aware of Pip's --use-mirrors flag, and sometimes people on my team have worked around by using --index-url http://f.pypi.python.org/simple (or another mirror) until they have a mirror that responds in a timely fashion. We're in the UK, but there's a PyPI mirror in Germany, and we don't have issues downloading data from other sites.

So, I'm looking at ways to mirror PyPI internally for our team.

The options I've looked at are:

  1. Running my own PyPI instance. There's the official PyPI implementation: CheeseShop as well as several third party implementations, such as: djangopypi and pypiserver (see footnote)

    The problem with this approach is that I'm not interested in full PyPI functionality with file upload, I just want to mirror the content it provides.

  2. Running a PyPI mirror with pep381client or pypi-mirror.

    This looks like it could work, but it requires my mirror to download everything from PyPI first. I've set up a test instance of pep381client, but my download speed varies between 5 Kb/s and 200 Kb/s (bits, not bytes). Unless there's a copy of the full PyPI archive somewhere, it will take me weeks to have a useful mirror.

  3. Using a PyPI round-robin proxy such as yopypi.

    This is irrelevant now that http://pypi.python.org itself consists of several geographically distinct servers.

  4. Copying around a virtualenv between developers, or hosting a folder of the current project's dependencies.

    This doesn't scale: we have several different Python projects whose dependencies change (slowly) over time. As soon as the dependencies of any project change, this central folder must be updated to add the new dependencies. Copying the virtualenv is worse than copying the packages though, since any Python packages with C modules need to be compiled for the target system. Our team has both Linux and OS X users.

    (This still looks like the best option of a bad bunch.)

  5. Using an intelligent PyPI caching proxy: collective.eggproxy

    This seems like it would be a very good solution, but the last version on PyPI is dated 2009 and discusses mod_python.

What do other large Python teams do? What's the best solution to quickly install the same set of python packages?

Footnotes:

  • I've seen the question How to roll my own PyPI?, but that question relates to hosting private code.
  • The Python wiki lists alternative PyPI implementations
  • I've also recently discovered Crate.io but I don't believe that helps me when using Pip.
  • There's a website monitoring PyPI mirror status
  • Some packages on PyPI have their files hosted elsewhere so even a perfect mirror won't help all dependencies
Answer

Do you have a shared filesystem?

Because I would use pip's cache setting. It's pretty simple. Make a folder called pip-cache in /mnt for example.

mkdir /mnt/pip-cache

Then each developer would put the following line into their pip config (unix = $HOME/.pip/pip.conf, win = %HOME%\pip\pip.ini)

[global]
download-cache = /mnt/pip-cache

It still checks PyPi, looks for the latest version. Then checks if that version is in the cache. If so it installs it from there. If not it downloads it. Stores it in the cache and installs it. So each package would only be downloaded once per new version.

https://en.xdnf.cn/q/26507.html

Related Q&A

How to suppress the deprecation warnings in Django?

Every time Im using the django-admin command — even on TAB–completion — it throws a RemovedInDjango19Warning (and a lot more if I use the test command). How can I suppress those warnings?Im using D…

whats the fastest way to find eigenvalues/vectors in python?

Currently im using numpy which does the job. But, as im dealing with matrices with several thousands of rows/columns and later this figure will go up to tens of thousands, i was wondering if there was …

Python Patch/Mock class method but still call original method

I want to use patch to record all function calls made to a function in a class for a unittest, but need the original function to still run as expected. I created a dummy code example below:from mock im…

Daemon vs Upstart for python script

I have written a module in Python and want it to run continuously once started and need to stop it when I need to update other modules. I will likely be using monit to restart it, if module has crashed…

Comparison of Python modes for Emacs

So I have Emacs 24.3 and with it comes a quite recent python.el file providing a Python mode for editing.But I keep reading that there is a python-mode.el on Launchpad, and comparing the two files it j…

Python best formatting practice for lists, dictionary, etc

I have been looking over the Python documentation for code formatting best practice for large lists and dictionaries, for example,something = {foo : bar, foo2 : bar2, foo3 : bar3..... 200 chars wide, e…

TypeError: string indices must be integers, not str // working with dict [duplicate]

This question already has answers here:Why am I seeing "TypeError: string indices must be integers"?(10 answers)Closed 28 days ago.I am trying to define a procedure, involved(courses, person…

Pandas: create dataframe from list of namedtuple

Im new to pandas, therefore perhaps Im asking a very stupid question. Normally initialization of data frame in pandas would be column-wise, where I put in dict with key of column names and values of li…

Closest equivalent of a factor variable in Python Pandas

What is the closest equivalent to an R Factor variable in Python pandas?

Temporarily Disabling Django Caching

How do you disable Django caching on a per checkout basis?Back before Django 1.3, I could disable caching for my local development checkout by specifying CACHE_BACKEND = None, in a settings_local.py i…