Accessing NumPy record array columns in Cython

2024/10/12 12:34:45

I'm a relatively experienced Python programmer, but haven't written any C in a very long time and am attempting to understand Cython. I'm trying to write a Cython function that will operate on a column of a NumPy recarray.

The code I have so far is below.

recarray_func.pyx:

import numpy as np
cimport numpy as npcdef packed struct rec_cell0:np.float32_t f0np.int64_t i0, i1, i2def sum(np.ndarray[rec_cell0, ndim=1] recarray):cdef Py_ssize_t icdef rec_cell0 *cellcdef np.float32_t running_sum = 0for i in range(recarray.shape[0]):cell = &recarray[i]running_sum += cell.f0return running_sum

At the interpreter prompt:

array = np.recarray((100, ), names=['f0', 'i0', 'i1', 'i2'],formats=['f4', 'i8', 'i8', 'i8'])
recarray_func.sum(array)

This simply sums the f0 column of the recarray. It compiles and runs without a problem.

My question is, how would I modify this so that it can operate on any column? In the example above, the column to sum is hard coded and accessed through dot notation. Is it possible to change the function so the column to sum is passed in as a parameter?

Answer

I believe this should be possible using Cython's memoryviews. Something along these lines should work (code not tested):

import numpy as np
cimport numpy as npcdef packed struct rec_cell0:np.float32_t f0np.int64_t i0, i1, i2def sum(rec_cell0[:] recview):cdef Py_ssize_t icdef np.float32_t running_sum = 0for i in range(recview.shape[0]):running_sum += recview[i].f0return running_sum

Speed can probably be increased by ensuring that the record array you pass to Cython is contiguous. On the python (calling) side, you can use np.require, while the function signature should change to rec_cell0[::1] recview to indicate that the array can be assumed to be contiguous. And as always, once the code has been tested, turning off the boundscheck, wraparound and nonecheck compiler directives in Cython will likely further improve speed.

https://en.xdnf.cn/q/69654.html

Related Q&A

scipy append all rows of one sparse matrix to another

I have a numpy matrix and want to append another matrix to that.The two matrices have the shapes:m1.shape = (2777, 5902) m2.shape = (695, 5902)I want to append m2 to m1 so that the new matrix is of sh…

Add argparse arguments from external modules

Im trying to write a Python program that could be extended by third parties. The program will be run from the command line with whatever arguments are supplied.In order to allow third parties to creat…

Cosine similarity for very large dataset

I am having trouble with calculating cosine similarity between large list of 100-dimensional vectors. When I use from sklearn.metrics.pairwise import cosine_similarity, I get MemoryError on my 16 GB ma…

What exactly are the csv modules Dialect settings for excel-tab?

The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the formatpreferred by Excel,” or “read data from this file which wa…

Python: how to make a recursive generator function

I have been working on generating all possible submodels for a biological problem. I have a working recursion for generating a big list of all the submodels I want. However, the lists get unmanageably …

Change default options in pandas

Im wondering if theres any way to change the default display options for pandas. Id like to change the display formatting as well as the display width each time I run python, eg:pandas.options.display.…

python-messaging Failed to handle HTTP request

I am using the code below to try to send an MMS message with python-messaging https://github.com/pmarti/python-messaging/blob/master/doc/tutorial/mms.rst Although the connection seems to go smoothly I …

Plotting confidence and prediction intervals with repeated entries

I have a correlation plot for two variables, the predictor variable (temperature) on the x-axis, and the response variable (density) on the y-axis. My best fit least squares regression line is a 2nd or…

Saving and Loading of dataframe to csv results in Unnamed columns

prob in the title. exaple:x=[(a,a,c) for i in range(5)] df = DataFrame(x,columns=[col1,col2,col3]) df.to_csv(test.csv) df1 = read_csv(test.csv)Unnamed: 0 col1 col2 col3 0 0 a a c 1 …

Python: print specific character from string

How do I print a specific character from a string in Python? I am still learning and now trying to make a hangman like program. The idea is that the user enters one character, and if it is in the word…