scipy sparse matrix: remove the rows whose all elements are zero

2024/10/12 2:25:41

I have a sparse matrix which is transformed from sklearn tfidfVectorier. I believe that some rows are all-zero rows. I want to remove them. However, as far as I know, the existing built-in functions, e.g. nonzero() and eliminate_zero(), focus on zero entries, rather than rows.

Is there any easy way to remove all-zero rows from a sparse matrix?

Example: What I have now (actually in sparse format):

[ [0, 0, 0][1, 0, 2][0, 0, 1] ]

What I want to get:

[ [1, 0, 2][0, 0, 1] ]
Answer

Slicing + getnnz() does the trick:

M = M[M.getnnz(1)>0]

Works directly on csr_array. You can also remove all 0 columns without changing formats:

M = M[:,M.getnnz(0)>0]

However if you want to remove both you need

M = M[M.getnnz(1)>0][:,M.getnnz(0)>0] #GOOD

I am not sure why but

M = M[M.getnnz(1)>0, M.getnnz(0)>0] #BAD

does not work.

https://en.xdnf.cn/q/69704.html

Related Q&A

Time complexity for adding elements to list vs set in python

Why does adding elements to a set take longer than adding elements to a list in python? I created a loop and iterated over 1000000 elements added it to a list and a set. List is consistently taking ar…

ERROR: Could not install packages due to an EnvironmentError: [Errno 28] No space left on device

I was trying to install turicreate using pip install -U turicreate But got the error Could not install packages due to an EnvironmentError: [Errno 28] Nospace left on device.I followed all the steps on…

How to find cluster centroid with Scikit-learn [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

How do I use the FPS argument in cv2.VideoWriter?

Ok, so I am making a video. I want to know exactly how to use the FPS argument. It is a float, so I assumed it was what interval do I want between each frame. Can you give an example? I just want to k…

Best practice for using common subexpression elimination with lambdify in SymPy

Im currently attempting to use SymPy to generate and numerically evaluate a function and its gradient. For simplicity, Ill use the following function as an example (keeping in mind that the real functi…

Determine if a text extract from spacy is a complete sentence

We are working on sentences extracted from a PDF. The problem is that it includes the title, footers, table of contents, etc. Is there a way to determine if the sentence we get when pass the document t…

Drawing labels that follow their edges in a Networkx graph

Working with Networkx, I have several edges that need to be displayed in different ways. For that I use the connectionstyle, some edges are straight lines, some others are Arc3. The problem is that eve…

randomly choose 100 documents under a directory

There are about 2000 documents under the directory. I want to randomly select some documents and copy them to a new directory automatically.Some relevant information about generating one document name …

Oauth client initialization in python for tumblr API using Python-oauth2

Im new to Oauth. In the past for twitter applications written in Python i used python-oauth2 library to initialize client like this:consumer = oauth.Consumer(key = CONSUMER_KEY, secret = CONSUMER_SECRE…

Model description in django-admin

Is it possible to put a model description or description on the list display page of a certain model in django-admin?Im talking about something like when you click a model name link on the homepage of…