What is the fastest way in Cython to create a new array from an existing array and a variable

2024/9/20 19:42:40

Suppose I have an array

from array import array
myarr = array('l', [1, 2, 3])

and a variable: myvar = 4 what is the fastest way to create a new array:

newarray = array('l', [1, 2, 3, 4])

You can assume all elements are of 'long' type

I have tried to create a new array and use array.append() not sure if it is fastest. I was thinking of using memoryview like: malloc(4*sizeof(long)) but I don't know how to copy a shorter array into part of the memoryview. then insert last element into last position.

I am fairly new to Cython. Thanks for any help!

Update: I compared the following three methods:

Cython: [100000 loops, best of 3: 5.94 µs per loop]

from libc.stdlib cimport mallocdef cappend(long[:] arr, long var, size_t N):cdef long[:] result = <long[:(N+1)]>malloc((N+1)*sizeof(long))result.base[:N] = arrresult.base[N] = varreturn result

array: [1000000 loops, best of 3: 1.21 µs per loop]

from array import array
import copy
def pyappend(arr, x):result = copy.copy(arr)result.append(x)return result

list append: [1000000 loops, best of 3: 480 ns per loop]

def pylistappend(lst, x):result = lst[:]result.append(x)return result

is there hope to improve the cython part and beat the array one?

Answer

Cython gives us more access to the internals of array.array than the "normal" python, so we can utilize it to speed up the code:

  1. almost by factor 7 for your small example (by eliminating most of the overhead).
  2. by factor 2 for larger inputs by eliminating an unnecessary array-copy.

Read on for more details.


It's a little bit unusual to try to optimize a function for such small input, but not without (at least theoretical) interest.

So let's start with your functions as baseline:

a=array('l', [1,2,3])
%timeit pyappend(a, 8)
1.03 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)lst=[1,2,3]
%timeit pylistappend(lst, 8)
279 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

We must be aware: what we are measuring is not the cost of copying but the cost of overhead (python interpreter, calling functions and so on), for example there is no difference whether a has 3 or 5 elements:

a=array('l', range(5))
%timeit pyappend(a, 8)
1.03 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In the array-version, we have more overhead because we have indirection via copy module, we can try to eliminate that:

def pyappend2(arr, x): result = array('l',arr)                   result.append(x)                               return result%timeit pyappend2(a, 8)
496 ns ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

That is faster. Now let's use cython - this would eliminate the interpreter costs:

 %%cythondef cylistappend(lst, x):      result = lst[:]                                  result.append(x)                            return result%%cythonfrom cpython cimport arraydef cyappend(array.array arr, long long int x):cdef array.array res = array.array('l', arr)res.append(x)return res   %timeit cylistappend(lst, 8)193 ns ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)%%timeit cyappend(a, 8)421 ns ± 8.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The cython versions is about 33% faster for list and about 10% faster for array. The constructor array.array() expects an iterable, but we already have an array.array, so we use the functionality from cpython to get access to internals of the array.array object and improve the situation a little:

 %%cythonfrom cpython cimport arraydef cyappend2(array.array arr, long long int x):cdef array.array res = array.copy(arr)res.append(x)return res%timeit cyappend2(a, 8)305 ns ± 7.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For the next step we need to know how array.array appends elements: Normally, it over-allocates, so append() has amortized cost O(1), however after array.copy the new array is exactly the needed number of elements and the next append invokes reallocation. We need to change that (see here for the description of the used functions):

  %%cythonfrom cpython cimport arrayfrom libc.string cimport memcpydef cyappend3(array.array arr, long long int x):cdef Py_ssize_t n=len(arr)cdef array.array res = array.clone(arr,n+1,False)memcpy(res.data.as_voidptr, arr.data.as_voidptr, 8*n)#that is pretty sloppy..res.data.as_longlongs[n]=xreturn res%timeit cyappend3(a, 8)154 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Similar to your function, the memory is over-allocated, so we don't need to call resize() any longer. Now we are faster than the list and almost 7 times faster than the original python version.


Let's compare timings for bigger array sizes (a=array('l',range(1000)), lst=list(range(1000)), where copying of the data makes most of the running time:

  pyappend             1.84 µs  #copy-module is slow!pyappend2            1.02 µscyappend             0.94 µs  #cython no big help - we are copying twicecyappend2            0.90 µs  #still copying twicecyappend3            0.43 µs  #copying only once -> twice as fast!pylistappend         4.09 µs # needs to increment refs of integerscylistappend         3.85 µs # the same as above

Now, eliminating the unnecessary copy for array.array gives us the expected factor 2.


For even bigger arrays (10000 elements), we see the following:

  pyappend             6.9 µs  #copy-module is slow!pyappend2            4.8 µscyappend2            4.4 µs  cyappend3            4.4 µs  

There is no longer a difference between the versions (if one discards the slow copy-module). The reason for this is the changed behavior of the array.array for such big number of elements: when copying it over-allocates thus avoiding the reallocation after the first append().

We can easily check it:

b=array('l', array('l', range(10**3)))#emulate our functions
b.buffer_info()
[] (94481422849232, 1000)
b.append(1)
b.buffer_info()
[] (94481422860352, 1001) # another pointer address -> reallocated
...
b=array('l', array('l', range(10**4)))
b.buffer_info()
[](94481426290064, 10000)
b.append(33)
b.buffer_info()
[](94481426290064, 10001) # the same pointer address -> no reallocation!
https://en.xdnf.cn/q/72466.html

Related Q&A

Subclassing and built-in methods in Python

For convenience, I wanted to subclass socket to create an ICMP socket:class ICMPSocket(socket.socket):def __init__(self):socket.socket.__init__(self, socket.AF_INET,socket.SOCK_RAW,socket.getprotobynam…

How to load Rs .rdata files into Python?

I am trying to convert one part of R code in to Python. In this process I am facing some problems.I have a R code as shown below. Here I am saving my R output in .rdata format.nms <- names(mtcars) s…

how to set cookie in python mechanize

After sending request to the serverbr.open(http://xxxx)br.select_form(nr=0) br.form[MESSAGE] = 1 2 3 4 5br.submit()I get the response title, which has set-cookieSet-Cookie: PON=xxx.xxx.xxx.111; expir…

How can I deal with a massive delete from Django Admin?

Im working with Django 2.2.10.I have a model called Site, and a model called Record. Each record is associated with a single site (Foreign Key).After my app runs for a few days/weeks/months, each site …

What is colocate_with used for in tensorflow?

Here is the link of the official docs. https://www.tensorflow.org/versions/r1.3/api_docs/python/tf/colocate_with

Getting tkinter to work with python 3.x on macos with asdf [duplicate]

This question already has answers here:Why does tkinter (or turtle) seem to be missing or broken? Shouldnt it be part of the standard library?(4 answers)Closed 8 months ago.So Im stumped. How do I ge…

Flask server sent events socket exception

I am thinking of using SSE to push new data to the client and using Flot(javascript charting library) display "live" updates. My server runs on python Flask framework and I have figured out h…

Pandas adding scalar value to numeric column?

Given a dataframe like thisImageId | Width | Height | lb0 | x0 | y0 | lb1 | x1 | y1 | lb2 | x2 | y2 0 abc | 200 | 500 | ijk | 115| 8 | zyx | 15 | 16 | www | 23 | 42 1 def | 300 | 800 …

Sklearn Pipeline all the input array dimensions for the concatenation axis must match exactly

import pandas as pd from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer from sklearn.pipeline import Pipeline from sklearn.svm import LinearSVC from sklearn.preprocessing impo…

Python Pandas read_excel returns empty Dataframe

Reading a simple xls returning empty dataframe, cant figure it out for the life of me:path = (c:/Users/Desktop/Stuff/Ready) files = os.listdir(path) print(files)files_xlsx = [f for f in files if f[-3:]…