Is there a way to get the top k values per row of a numpy array (Python)?

2024/9/8 10:26:14

Given a numpy array of the form below:

x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]

is there a way to retain the top-3 values in each row and set others to zero in python (without an explicit loop). The result in the case of the example above would be

x = [[4.,3.,0.,0.,8.],[0.,3.1,0.,9.2,5.5],[0.0,7.0,4.4,0.0,1.3]]

Code for one example

import numpy as np
arr = np.array([1.2,3.1,0.,9.2,5.5,3.2])
a = list(range(6))
A=set(indexes); B=set(a)

The output:

array([0. , 0. , 0. , 9.2, 5.5, 3.2])

Above is my sample code (with many lines) for a 1-D numpy array. Looping through each row of a numpy array and performing this same computation repeatedly would be quite expensive. Is there a simpler way?


Here is a fully vectorized code without third party outside numpy. It is using numpy's argpartition to efficiently find the k-th values. See for instance this answer for other use cases.

def truncate_top_k(x, k, inplace=False):m, n = x.shape# get (unsorted) indices of top-k valuestopk_indices = numpy.argpartition(x, -k, axis=1)[:, -k:]# get k-th valuerows, _ = numpy.indices((m, k))kth_vals = x[rows, topk_indices].min(axis=1)# get boolean mask of values smaller than k-this_smaller_than_kth = x < kth_vals[:, None]# replace mask by 0if not inplace:return numpy.where(is_smaller_than_kth, 0, x)x[is_smaller_than_kth] = 0return x

Related Q&A

Python with tcpdump in a subprocess: how to close subprocess properly?

I have a Python script to capture network traffic with tcpdump in a subprocess:p = subprocess.Popen([tcpdump, -I, -i, en1,-w, cap.pcap], stdout=subprocess.PIPE) time.sleep(10) p.kill()When this script …

How to install GDB with Python support on Windows 7

I need to debug cython code. Official documentation says, I need to install "gdb 7.2 or higher, built with Python support". Unfortunately I didnt find any step-by-step guide how to install it…

Pip3 is unable to install requirements.txt during docker build

I am using docker tutorial ( to build a simple python app. Using freeze command I made requirements.txt file which consists a lot of packages. When…

__del__ at program end

Suppose there is a program with a couple of objects living in it at runtime.Is the __del__ method of each object called when the programs ends?If yes I could for example do something like this:class C…

PySpark groupby and max value selection

I have a PySpark dataframe likename city datesatya Mumbai 13/10/2016satya Pune 02/11/2016satya Mumbai 22/11/2016satya Pune 29/11/2016satya Delhi 30/11/2016panda Delhi 29/11/2016…

Nesting descriptors/decorators in python

Im having a hard time understanding what happens when I try to nest descriptors/decorators. Im using python 2.7.For example, lets take the following simplified versions of property and classmethod:clas…

Retrieve definition for parenthesized abbreviation, based on letter count

I need to retrieve the definition of an acronym based on the number of letters enclosed in parentheses. For the data Im dealing with, the number of letters in parentheses corresponds to the number of w…

Python (Watchdog) - Waiting for file to be created correctly

Im new to Python and Im trying to implement a good "file creation" detection. If I do not put a time.sleep(x) my files are elaborated in a wrong way since they are still being "created&q…

How do I display add model in tabular format in the Django admin?

Im just starting out with Django writing my first app - a chore chart manager for my family. In the tutorial it shows you how to add related objects in a tabular form. I dont care about the related obj…

Python Matplotlib - Impose shape dimensions with Imsave

I plot a great number of pictures with matplotlib in order to make video with it but when i try to make the video i saw the shape of the pictures is not the same in time...It induces some errors. Is th…