Using scipy sparse matrices to solve system of equations

2024/10/3 10:43:07

This is a follow up to How to set up and solve simultaneous equations in python but I feel deserves its own reputation points for any answer.

For a fixed integer n, I have a set of 2(n-1) simultaneous equations as follows.

M(p) = 1+((n-p-1)/n)*M(n-1) + (2/n)*N(p-1) + ((p-1)/n)*M(p-1)N(p) = 1+((n-p-1)/n)*M(n-1) + (p/n)*N(p-1)M(1) = 1+((n-2)/n)*M(n-1) + (2/n)*N(0)N(0) = 1+((n-1)/n)*M(n-1)

M(p) is defined for 1 <= p <= n-1. N(p) is defined for 0 <= p <= n-2. Notice also that p is just a constant integer in every equation so the whole system is linear.

Some very nice answers were given for how to set up a system of equations in python. However, the system is sparse and I would like to solve it for large n. How can I use scipy's sparse matrix representation and http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html for example instead?

Answer

I wouldn't normally keep beating a dead horse, but it happens that my non-vectorized approach to solving your other question, has some merit when things get big. Because I was actually filling the coefficient matrix one item at a time, it is very easy to translate into COO sparse matrix format, which can efficiently be transformed to CSC and solved. The following does it:

import scipy.sparsedef sps_solve(n) :# Solution vector is [N[0], N[1], ..., N[n - 2], M[1], M[2], ..., M[n - 1]]n_pos = lambda p : pm_pos = lambda p : p + n - 2data = []row = []col = []# p = 0# n * N[0] + (1 - n) * M[n-1] = nrow += [n_pos(0), n_pos(0)]col += [n_pos(0), m_pos(n - 1)]data += [n, 1 - n]for p in xrange(1, n - 1) :# n * M[p] + (1 + p - n) * M[n - 1] - 2 * N[p - 1] +#  (1 - p) * M[p - 1] = nrow += [m_pos(p)] * (4 if p > 1 else 3)col += ([m_pos(p), m_pos(n - 1), n_pos(p - 1)] +([m_pos(p - 1)] if p > 1 else []))data += [n, 1 + p - n , -2] + ([1 - p] if p > 1 else [])# n * N[p] + (1 + p -n) * M[n - 1] - p * N[p - 1] = nrow += [n_pos(p)] * 3col += [n_pos(p), m_pos(n - 1), n_pos(p - 1)]data += [n, 1 + p - n, -p]if n > 2 :# p = n - 1# n * M[n - 1] - 2 * N[n - 2] + (2 - n) * M[n - 2] = nrow += [m_pos(n-1)] * 3col += [m_pos(n - 1), n_pos(n - 2), m_pos(n - 2)]data += [n, -2, 2 - n]else :# p = 1 # n * M[1] - 2 * N[0] = nrow += [m_pos(n - 1)] * 2col += [m_pos(n - 1), n_pos(n - 2)]data += [n, -2]coeff_mat = scipy.sparse.coo_matrix((data, (row, col))).tocsc()return scipy.sparse.linalg.spsolve(coeff_mat,np.ones(2 * (n - 1)) * n)

It is of course much more verbose than building it from vectorized blocks, as TheodorosZelleke does, but an interesting thing happens when you time both approaches:

enter image description here

First, and this is (very) nice, time is scaling linearly in both solutions, as one would expect from using the sparse approach. But the solution I gave in this answer is always faster, more so for larger ns. Just for the fun of it, I also timed TheodorosZelleke's dense approach from the other question, which gives this nice graph showing the different scaling of both types of solutions, and how very early, somewhere around n = 75, the solution here should be your choice:

enter image description here

I don't know enough about scipy.sparse to really figure out why the differences between the two sparse approaches, although I suspect heavily of the use of LIL format sparse matrices. There may be some very marginal performance gain, although a lot of compactness in the code, by turning TheodorosZelleke's answer into COO format. But that is left as an exercise for the OP!

https://en.xdnf.cn/q/70737.html

Related Q&A

Segmentation Fault in Pandas read_csv

I have Python 2.7.5 on Os X 10.9 with Pandas version 0.12.0-943-gaef5061. When I download this train.csv file and run read_csv, I get Segmentation Fault 11. I have experimented with the file encoding…

Multiple subprocesses with timeouts

Im using a recipe that relies on SIGALRM to set alarm interrupt -- Using module subprocess with timeoutThe problem is that I have more than one Python script using signal.ALARM process to set time-outs…

what is the difference between tfidf vectorizer and tfidf transformer

I know that the formula for tfidf vectorizer is Count of word/Total count * log(Number of documents / no.of documents where word is present)I saw theres tfidf transformer in the scikit learn and I just…

Use Pandas string method contains on a Series containing lists of strings

Given a simple Pandas Series that contains some strings which can consist of more than one sentence:In: import pandas as pd s = pd.Series([This is a long text. It has multiple sentences.,Do you see? M…

Is this the correct way of whitening an image in python?

I am trying to zero-center and whiten CIFAR10 dataset, but the result I get looks like random noise! Cifar10 dataset contains 60,000 color images of size 32x32. The training set contains 50,000 and tes…

Python zlib output, how to recover out of mysql utf-8 table?

In python, I compressed a string using zlib, and then inserted it into a mysql column that is of type blob, using the utf-8 encoding. The string comes back as utf-8, but its not clear how to get it bac…

Incorrect user for supervisord celeryd

I have some periodic tasks that I run with celery (daemonized by supervisord), but after trying to create a directory in the home dir for the user i setup for the supervisord process I got a "perm…

Pandas drop rows where column contains *

Im trying to drop all rows from this df where column DB Serial contains the character *:DB Serial 0 13058 1 13069 2 *13070 3 13070 4 13044 5 13042I am using:df = df[~df[DB Serial…

How to stop scrapy spider after certain number of requests?

I am developing an simple scraper to get 9 gag posts and its images but due to some technical difficulties iam unable to stop the scraper and it keeps on scraping which i dont want.I want to increase t…

What is the difference between single and double bracket Numpy array?

import numpy as np a=np.random.randn(1, 2) b=np.zeros((1,2)) print("Data type of A: ",type(a)) print("Data type of A: ",type(b))Output:Data type of A: <class numpy.ndarray> D…