Speed up Metropolis-Hastings algorithm in Python

2024/9/16 18:54:17

I have some code that samples a posterior distribution using MCMC, specifically Metropolis Hastings. I use scipy to generate random samples:

import numpy as np
from scipy import statsdef get_samples(n):"""Generate and return a randomly sampled posterior.For simplicity, Prior is fixed as Beta(a=2,b=5), Likelihood is fixed as Normal(0,2):type n: int:param n: number of iterations:rtype: numpy.ndarray"""x_t = stats.uniform(0,1).rvs() # initial valueposterior = np.zeros((n,))for t in range(n):x_prime = stats.norm(loc=x_t).rvs() # candidatep1 = stats.beta(a=2,b=5).pdf(x_prime)*stats.norm(loc=0,scale=2).pdf(x_prime) # prior * likelihood p2 = stats.beta(a=2,b=5).pdf(x_t)*stats.norm(loc=0,scale=2).pdf(x_t) # prior * likelihood alpha = p1/p2 # ratiou = stats.uniform(0,1).rvs() # random uniformif u <= alpha:x_t = x_prime # acceptposterior[t] = x_telif u > alpha:x_t = x_t # rejectposterior = posterior[np.where(posterior > 0)] # get rid of initial zeros that don't contribute to distributionreturn posterior

Generally, I try to avoid using explicit for loops in python - I would try to generate everything using pure numpy. For the case of this algorithm however, a for loop with if statements is unavoidable. Therefore, the code is quite slow. When I profile my code, it is spending most time within the for loop (obviously), and more specifically, the slowest parts are in generating the random numbers; stats.beta().pdf() and stats.norm().pdf().

Sometimes I use numba to speed up my code for matrix operations. While numba is compatible with some numpy operations, generating random numbers is not one of them. Numba has a cuda rng, but this is limited to normal and uniform distributions.

My question is, is there a way to significantly speed up the code above using some sort of random sampling of various distributions compatible with numba?

We don't have to limit ourselves to numba, but it is the only easy to use optimizer that I know of. More generally, I am looking for ways to speed up random sampling for various distributions (beta, gamma, poisson) within a for loop in python.

Answer

There are lots of optimisations you can make to this code before you start thinking about numba et. al. (I managed to get a 25x speed up on this code only by being smart with the algorithm's implementation)

Firstly, there's an error in your implementation of the Metropolis--Hastings algorithm. You need to keep every iteration of the scheme, regardless of whether your chain moves or not. That is, you need to remove posterior = posterior[np.where(posterior > 0)] from your code and at the end of each loop have posterior[t] = x_t.

Secondly, this example seems odd. Typically, with these kinds of inference problems we're looking to infer the parameters of a distribution given some observations. Here, though, the parameters of the distribution are known and instead you're sampling observations? Anyway, whatever, regardless of this I'm happy to roll with your example and show you how to speed it up.

Speed-up

To get started, remove anything which is not dependent on the value of t from the main for loop. Start by removing the generation of the random walk innovation from the for loop:

    x_t = stats.uniform(0,1).rvs()innov = stats.norm(loc=0).rvs(size=n)for t in range(n):x_prime = x_t + innov[t]

Of course it is also possible to move the random generation of u from the for loop:

    x_t = stats.uniform(0,1).rvs()innov = stats.norm(loc=0).rvs(size=n)u = np.random.uniform(size=n)for t in range(n):x_prime = x_t + innov[t]...if u[t] <= alpha:

Another issue is that you're computing the current posterior p2 in every loop, which isn't necessary. In each loop you need to calculate the proposed posterior p1, and when the proposal is accepted you can update p2 to equal p1:

    x_t = stats.uniform(0,1).rvs()innov = stats.norm(loc=0).rvs(size=n)u = np.random.uniform(size=n)p2 = stats.beta(a=2,b=5).pdf(x_t)*stats.norm(loc=0,scale=2).pdf(x_t)for t in range(n):x_prime = x_t + innov[t]p1 = stats.beta(a=2,b=5).pdf(x_prime)*stats.norm(loc=0,scale=2).pdf(x_prime)...if u[t] <= alpha:x_t = x_prime # acceptp2 = p1posterior[t] = x_t

A very minor improvement could be in importing the scipy stats functions directly into the name space:

from scipy.stats import norm, beta

Another very minor improvement is in noticing that the elif statement in your code doesn't do anything and so can be removed.

Putting this altogether and using more sensible variable names I came up with:

from scipy.stats import norm, beta
import numpy as npdef my_get_samples(n, sigma=1):x_cur = np.random.uniform()innov = norm.rvs(size=n, scale=sigma)u = np.random.uniform(size=n)post_cur = beta.pdf(x_cur, a=2, b=5) * norm.pdf(x_cur, loc=0, scale=2)posterior = np.zeros(n)for t in range(n):x_prop = x_cur + innov[t]post_prop = beta.pdf(x_prop, a=2, b=5) * norm.pdf(x_prop, loc=0, scale=2)alpha = post_prop / post_curif u[t] <= alpha:x_cur = x_proppost_cur = post_propposterior[t] = x_curreturn posterior

Now, for a speed comparison:

%timeit get_samples(1000)
3.19 s ± 5.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit my_get_samples(1000)
127 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

That's a speed up of 25x

ESS

It's worth noting that brute speed isn't everything when it comes to MCMC algorithms. Really, what you're interested in is the number of independent(ish) draws you can make from the posterior per second. Typically, this is assessed with the ESS (effective sample size). You can improve the efficiency of your algorithm (and hence increase your effective samples drawn per second) by tuning your random walk.

To do so it is typical to make an initial trial run, i.e. samples = my_get_samples(1000). From this output calculate sigma = 2.38**2 * np.var(samples). This value should then be used to tune the random walk in your scheme as innov = norm.rvs(size=n, scale=sigma). The seemingly arbitrary occurrence of 2.38^2 has it's origin in:

Weak convergence and optimal scaling of random walk Metropolisalgorithms (1997). A. Gelman, W. R. Gilks, and G. O. Roberts.

To illustrate the improvements tuning can make let's make two runs of my algorithm, one tuned and the other untuned, both using 10000 iterations:

x = my_get_samples(10000)
y = my_get_samples(10000, sigma=0.12)fig, ax = plt.subplots(1, 2)
ax[0].hist(x, density=True, bins=25, label='Untuned algorithm', color='C0')
ax[1].hist(y, density=True, bins=25, label='Tuned algorithm', color='C1')
ax[0].set_ylabel('density')
ax[0].set_xlabel('x'), ax[1].set_xlabel('x')
fig.legend()

You can immediately see the improvements tuning has made to our algorithm's efficiency. Remember, both runs were made for the same number of iterations. enter image description here

Final thoughts

If your algorithm is taking a very long time to converge, or if your samples have large amounts of autocorrelation, I'd consider looking into Cython to squeeze out further speed optimisations.

I'd also recommend checking out the PyStan project. It takes a bit of getting used to, but its NUTS HMC algorithm will likely outperform any Metropolis--Hastings algorithm you can write by hand.

https://en.xdnf.cn/q/72637.html

Related Q&A

How to properly use Rules, restrict_xpaths to crawl and parse URLs with scrapy?

I am trying to program a crawl spider to crawl RSS feeds of a website and then parsing the meta tags of the article.The first RSS page is a page that displays the RSS categories. I managed to extract t…

sudo required for easy_install pip in OS X Lion?

Im coming from Snow Leopard at work to a Lion installation at home. I do NOT remember having to:sudo easy_install pipIs that required for Lion? I got errors until I did that, and pip ended up here:[…

CUDNN_STATUS_NOT_INITIALIZED when trying to run TensorFlow

I have installed TensorFlow 1.7 on an Ubuntu 16.04 with Cuda 9.0 and CuDNN 7.0.5 and vanilla Python 2.7 and although they samples for both CUDA and CuDNN run fine, and TensorFlow sees the GPU (so some …

What is the correct way to switch freely between asynchronous tasks?

Suppose I have some tasks running asynchronously. They may be totally independent, but I still want to set points where the tasks will pause so they can run concurrently. What is the correct way to run…

How to write integers to port using PySerial

I am trying to write data to the first serial port, COM1, using PySerial.import serial ser = serial.Serial(0) print (ser.name) ser.baudrate = 56700 ser.write("abcdefg") ser.close()ought to wo…

Pandas sort columns by name

I have the following dataframe, where I would like to sort the columns according to the name. 1 | 13_1 | 13_10| 13_2 | 2 | 3 9 | 31 | 2 | 1 | 3 | 4I am trying to sort the columns in the f…

Series objects are mutable, thus they cannot be hashed error calling to_csv

I have a large Dataframe (5 days with one value per second, several columns) of which Id like to save 2 columns in a csv file with python pandas df.to_csv module.I tried different ways but always get t…

Python client / server question

Im working on a bit of a project in python. I have a client and a server. The server listens for connections and once a connection is received it waits for input from the client. The idea is that the c…

Segmentation fault during import cv on Mac OS

Trying to compile opencv on my Mac from source. I have following CMakeCache.txt: http://pastebin.com/KqPHjBx0I make ccmake .., press c, then g. Than I make sudo make -j8: http://pastebin.com/cJyr1cEdTh…

Python bug - or my stupidity - EOL while scanning string literal

I cannot see a significant difference between the two following lines. Yet the first parses, and the latter, does not.In [5]: n=""" \\"Axis of Awesome\\" """In […