Cython: size attribute of memoryviews

2024/10/13 12:20:39

I'm using a lot of 3D memoryviews in Cython, e.g.

cython.declare(a='double[:, :, ::1]')
a = np.empty((10, 20, 30), dtype='double')

I often want to loop over all elements of a. I can do this using a triple loop like

for i in range(a.shape[0]):for j in range(a.shape[1]):for k in range(a.shape[2]):a[i, j, k] = ...

If I do not care about the indices i, j and k, it is more efficient to do a flat loop, like

cython.declare(a_ptr='double*')
a_ptr = cython.address(a[0, 0, 0])
for i in range(size):a_ptr[i] = ...

Here I need to know the number of elements (size) in the array. This is given by the product of the elements in the shape attribute, i.e. size = a.shape[0]*a.shape[1]*a.shape[2], or more generally size = np.prod(np.asarray(a).shape). I find both of these ugly to write, and the (albeit small) computational overhead bothers me. The nice way to do it is to use the builtin size attribute of memoryviews, size = a.size. However, for reasons I cannot fathom, this leads to unoptimized C code, as evident from the annotations html file generated by Cython. Specifically, the C code generated by size = a.shape[0]*a.shape[1]*a.shape[2] is simply

__pyx_v_size = (((__pyx_v_a.shape[0]) * (__pyx_v_a.shape[1])) * (__pyx_v_a.shape[2]));

where the C code generated from size = a.size is

__pyx_t_10 = __pyx_memoryview_fromslice(__pyx_v_a, 3, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_10);
__pyx_t_14 = __Pyx_PyObject_GetAttrStr(__pyx_t_10, __pyx_n_s_size); if (unlikely(!__pyx_t_14)) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_14);
__Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
__pyx_t_7 = __Pyx_PyIndex_AsSsize_t(__pyx_t_14); if (unlikely((__pyx_t_7 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_14); __pyx_t_14 = 0;
__pyx_v_size = __pyx_t_7;

To generate the above code, I have enabled all possible optimizations through compiler directives, meaning that the unwieldy C code generated by a.size cannot be optimized away. It looks to me as though the size "attribute" is not really a pre-computed attribute, but actually carries out a computation upon lookup. Furthermore, this computation is quite a bit more involved than simply taking the product over the shape attribute. I cannot find any hint of an explanation in the docs.

What is the explanation of this behavior, and do I have a better choice than writing out a.shape[0]*a.shape[1]*a.shape[2], if I really care about this micro optimization?

Answer

Already by looking at the produced C-code, you can already see that size is a property and not a simple C-member. Here is the original Cython-code for memory-views:

@cname('__pyx_memoryview')
cdef class memoryview(object):
...cdef object _size
...@propertydef size(self):if self._size is None:result = 1for length in self.view.shape[:self.view.ndim]:result *= lengthself._size = resultreturn self._size

It is easy to see, that the product is calculated only once and then cached. Clearly it doesn't play a big role for 3 dimensional arrays, but for a higher number of dimensions caching could become pretty important (as we will see, there are at most 8 dimensions, so it is not that clearly cut, whether this caching is really worth it).

One can understand the decision to lazily calculate the size - after all, size is not always needed/used and one doesn't want to pay for it. Clearly, there is a price to pay for this laziness if you use the size a lot - that is the trade off cython makes.

I would not dwell too long on the overhead of calling a.size - it is nothing compared to the overhead of calling a cython-function from python.

For example, the measurements of @danny measure only this python-call overhead and not the actual performance of the different approaches. To show this, I throw a third function into the mix:

%%cython
...
def both():a.size+a.shape[0]*a.shape[1]*a.shape[2]

which does double amount of the work, but

>>> %timeit mv_size
22.5 ns ± 0.0864 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)>>> %timeit mv_product
20.7 ns ± 0.087 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)>>>%timeit both
21 ns ± 0.39 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

is just as fast. On the other hand:

%%cython
...
def nothing():pass

isn't faster:

%timeit nothing
24.3 ns ± 0.854 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In a nutshell: I would use a.size because of the readability, assuming that optimizing that would not speed up my application, unless profiling proves something different.


The whole story: the variable a is of type __Pyx_memviewslice and not of type __pyx_memoryview as one could think. The struct __Pyx_memviewslice has the following definition:

struct __pyx_memoryview_obj;
typedef struct {struct __pyx_memoryview_obj *memview;char *data;Py_ssize_t shape[8];Py_ssize_t strides[8];Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;

that means, shape can be accessed very efficiently by the Cython-code, as it is a simple C-array (btw. I ask my self, what happens if there are more than 8 dimensions? - the answer is: you cannot have more than 8 dimensions).

The member memview is where the memory is hold and __pyx_memoryview_obj is the C-Extension which is produce from the cython-code we saw above and looks as follows:

/* "View.MemoryView":328* * @cname('__pyx_memoryview')* cdef class memoryview(object):             # <<<<<<<<<<<<<<* *     cdef object obj*/
struct __pyx_memoryview_obj {PyObject_HEADstruct __pyx_vtabstruct_memoryview *__pyx_vtab;PyObject *obj;PyObject *_size;PyObject *_array_interface;PyThread_type_lock lock;__pyx_atomic_int acquisition_count[2];__pyx_atomic_int *acquisition_count_aligned_p;Py_buffer view;int flags;int dtype_is_object;__Pyx_TypeInfo *typeinfo;
};

So, Pyx_memviewslice is not really a Python object -it is kind of convenience wrapper, which caches important data, like shape and stride so this information can be accessed fast and cheap.

What happens when we call a.size? First, __pyx_memoryview_fromslice is called which does some additional reference counting and some further stuff and returns the member memview from the __Pyx_memviewslice-object.

Then the property size is called on this returned memoryview, which accesses the cached value in _size as have been shown in the Cython code above.

It looks as if the python-programmers introduced a shortcut for such important information as shape, strides and suboffsets, but not for the size which is probably not so important - this is the reason for cleaner C-code in the case of shape.

https://en.xdnf.cn/q/69533.html

Related Q&A

python asynchronous httprequest

I am trying to use twitter search web service in python. I want to call a web service like:http://search.twitter.com/search.json?q=blue%20angels&rpp=5&include_entities=true&result_type=mix…

What are response codes for 256 and 512 for os.system in python scripting

When i ping servers with os.system in python i get multiple response codes. Command used - os.system("ping -q -c 30 -s SERVERANME")0 - Online 256 - Offline 512 - what does 512 mean ?

Sphinx floating point formatting

Im using Sphinx to generate documentation from code. Does anyone know if there is a way to control the formatting of floating point numbers generated from default arguments. For example if I have the f…

Truncating column width in pandas

Im reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters…

Django - CreateView with multiple models

Can I use Django CreateViews to make a form that add data to multiple tables? Ive created a model called UserMeta to store some additional informations of my users. The ProblemI want to create a view …

Is there a way to pass dictionary in tf.data.Dataset w/ tf.py_func?

Im using tf.data.Dataset in data processing and I want to do apply some python code with tf.py_func.BTW, I found that in tf.py_func, I cannot return a dictionary. Is there any way to do it or workaroun…

How to split only on carriage returns with readlines in python?

I have a text file that contains both \n and \r\n end-of-line markers. I want to split only on \r\n, but cant figure out a way to do this with pythons readlines method. Is there a simple workaround for…

Python + MySQLdb executemany

Im using Python and its MySQLdb module to import some measurement data into a Mysql database. The amount of data that we have is quite high (currently about ~250 MB of csv files and plenty of more to c…

How to popup success message in odoo?

I am sending invitation by clicking button after clicking button and successfully sending invitation there is pop up message of successfully invitation send. But the problem is that the main heading of…

How to make ttk.Scale behave more like tk.Scale?

Several Tk widgets also exist in Ttk versions. Usually they have the same general behaviour, but use "styles" and "themes" rather than per-instance appearance attributes (such as bg…