whats the fastest way to find eigenvalues/vectors in python?

2024/11/18 23:37:34

Currently im using numpy which does the job. But, as i'm dealing with matrices with several thousands of rows/columns and later this figure will go up to tens of thousands, i was wondering if there was a package in existence that can perform this kind of calculations faster ?

Answer
  • **if your matrix is sparse, then instantiate your matrix using a constructor from scipy.sparse then use the analogous eigenvector/eigenvalue methods in spicy.sparse.linalg. From a performance point of view, this has two advantages:

    • your matrix, built from the spicy.sparse constructor, will be smaller in proportion to how sparse it is.

    • the eigenvalue/eigenvector methods for sparse matrices (eigs, eigsh) accept an optional argument, k which is the number of eigenvector/eigenvalue pairs you want returned. Nearly always the number required to account for the >99% of the variance is far less then the number of columns, which you can verify ex post; in other words, you can tell method not to calculate and return all of the eigenvectors/eigenvalue pairs--beyond the (usually) small subset required to account for the variance, it's unlikely you need the rest.

  • use the linear algebra library in SciPy, scipy.linalg, instead of the NumPy library of the same name. These two libraries have the same name and use the same method names. Yet there's a difference in performance. This difference is caused by the fact that numpy.linalg is a less faithful wrapper on the analogous LAPACK routines which sacrifice some performance for portability and convenience (i.e., to comply with the NumPy design goal that the entire NumPy library should be built without a Fortran compiler). linalg in SciPy on the other hand is a much more complete wrapper on LAPACK and which uses f2py.

  • select the function appropriate for your use case; in other words, don't use a function does more than you need. In scipy.linalg there are several functions to calculate eigenvalues; the differences are not large, though by careful choice of the function to calculate eigenvalues, you should see a performance boost. For instance:

    • scipy.linalg.eig returns both the eigenvalues and eigenvectors
    • scipy.linalg.eigvals, returns only the eigenvalues. So if you only need the eigenvalues of a matrix then do not use linalg.eig, use linalg.eigvals instead.
    • if you have a real-valued square symmetric matrices (equal to its transpose) then use scipy.linalg.eigsh
  • optimize your Scipy build Preparing your SciPy build environement is done largely in SciPy's setup.py script. Perhaps the most significant option performance-wise is identifying any optimized LAPACK libraries such as ATLAS or Accelerate/vecLib framework (OS X only?) so that SciPy can detect them and build against them. Depending on the rig you have at the moment, optimizing your SciPy build then re-installing can give you a substantial performance increase. Additional notes from the SciPy core team are here.

Will these functions work for large matrices?

I should think so. These are industrial strength matrix decomposition methods, and which are just thin wrappers over the analogous Fortran LAPACK routines.

I have used most of the methods in the linalg library to decompose matrices in which the number of columns is usually between about 5 and 50, and in which the number of rows usually exceeds 500,000. Neither the SVD nor the eigenvalue methods seem to have any problem handling matrices of this size.

Using the SciPy library linalg you can calculate eigenvectors and eigenvalues, with a single call, using any of several methods from this library, eig, eigvalsh, and eigh.

>>> import numpy as NP
>>> from scipy import linalg as LA>>> A = NP.random.randint(0, 10, 25).reshape(5, 5)
>>> Aarray([[9, 5, 4, 3, 7],[3, 3, 2, 9, 7],[6, 5, 3, 4, 0],[7, 3, 5, 5, 5],[2, 5, 4, 7, 8]])>>> e_vals, e_vecs = LA.eig(A)
https://en.xdnf.cn/q/26505.html

Related Q&A

Python Patch/Mock class method but still call original method

I want to use patch to record all function calls made to a function in a class for a unittest, but need the original function to still run as expected. I created a dummy code example below:from mock im…

Daemon vs Upstart for python script

I have written a module in Python and want it to run continuously once started and need to stop it when I need to update other modules. I will likely be using monit to restart it, if module has crashed…

Comparison of Python modes for Emacs

So I have Emacs 24.3 and with it comes a quite recent python.el file providing a Python mode for editing.But I keep reading that there is a python-mode.el on Launchpad, and comparing the two files it j…

Python best formatting practice for lists, dictionary, etc

I have been looking over the Python documentation for code formatting best practice for large lists and dictionaries, for example,something = {foo : bar, foo2 : bar2, foo3 : bar3..... 200 chars wide, e…

TypeError: string indices must be integers, not str // working with dict [duplicate]

This question already has answers here:Why am I seeing "TypeError: string indices must be integers"?(10 answers)Closed 28 days ago.I am trying to define a procedure, involved(courses, person…

Pandas: create dataframe from list of namedtuple

Im new to pandas, therefore perhaps Im asking a very stupid question. Normally initialization of data frame in pandas would be column-wise, where I put in dict with key of column names and values of li…

Closest equivalent of a factor variable in Python Pandas

What is the closest equivalent to an R Factor variable in Python pandas?

Temporarily Disabling Django Caching

How do you disable Django caching on a per checkout basis?Back before Django 1.3, I could disable caching for my local development checkout by specifying CACHE_BACKEND = None, in a settings_local.py i…

How to get a complete exception stack trace in Python

The following snippet:import tracebackdef a():b()def b():try:c()except:traceback.print_exc()def c():assert Falsea()Produces this output:Traceback (most recent call last):File "test.py", line …

python - should I use static methods or top-level functions

I come from a Java background and Im new to python. I have a couple scripts that share some helper functions unique to the application related to reading and writing files. Some functions associated …