cython.parallel cannot see the difference in speed

2024/10/12 16:29:32

I tried to use cython.parallel prange. I can only see two cores 50% being used. How can I make use of all the cores. i.e. send the loops to the cores simultaneously sharing the arrays, volume and mc_vol?

EDIT: I also edited purely sequential for-loop which is about 30 seconds faster than than cython.parallel prange version. Both of them are using one core only. Is there are way to parallelize this.

cimport cython
from cython.parallel import prange, parallel, threadid
from libc.stdio cimport sprintf
from libc.stdlib cimport malloc, free
cimport numpy as np@cython.boundscheck(False)
@cython.wraparound(False)
cpdef MC_Surface(np.ndarray[np.int_t,ndim=3] volume, np.ndarray[np.float32_t,ndim=3] mc_vol):cdef int vol_len=len(volume)-1cdef int k, j, icdef char* pattern # a string pointer - allocate laterPerm_area = {"00000000": 0.000000,..."00011101": 1.515500}try:pattern = <char*>malloc(sizeof(char)*260)for k in range(vol_len):for j in range(vol_len):for i in range(vol_len):sprintf(pattern, "%i%i%i%i%i%i%i%i",volume[i, j, k],volume[i, j + 1, k],volume[i + 1, j, k],volume[i + 1, j + 1, k],volume[i, j, k + 1],volume[i, j + 1, k + 1],volume[i + 1, j, k + 1],volume[i + 1, j + 1, k + 1]);mc_vol[i, j, k] = Perm_area[pattern]# if Perm_area[pattern] > 0:#    print pattern, 'Area: ', Perm_area[pattern]#total_area += Perm_area[pattern]finally:free(pattern)
return mc_vol

EDIT following DavidW's suggestion, but prange is considerably slower:

 cpdef MC_Surface(np.ndarray[np.int_t,ndim=3] volume, np.ndarray[np.float32_t,ndim=3] mc_vol):cdef int vol_len=len(volume)-1cdef int k, j, icdef char* pattern # a string pointer - allocate laterPerm_area = {"00000000": 0.000000,..."00011101": 1.515500}with nogil,parallel():try:pattern = <char*>malloc(sizeof(char)*260)for k in prange(vol_len):for j in range(vol_len):for i in range(vol_len):sprintf(pattern, "%i%i%i%i%i%i%i%i",volume[i, j, k],volume[i, j + 1, k],volume[i + 1, j, k],volume[i + 1, j + 1, k],volume[i, j, k + 1],volume[i, j + 1, k + 1],volume[i + 1, j, k + 1],volume[i + 1, j + 1, k + 1]);with gil:mc_vol[i, j, k] = Perm_area[pattern]# if Perm_area[pattern] > 0:#    print pattern, 'Area: ', Perm_area[pattern]#    total_area += Perm_area[pattern]finally:free(pattern)return mc_vol

My setup file looks like:

setup(name='SurfaceArea',ext_modules=[Extension('c_marchSurf', ['c_marchSurf.pyx'], include_dirs=[numpy.get_include()],extra_compile_args=['-fopenmp'], extra_link_args=['-fopenmp'], language="c++")],cmdclass={'build_ext': build_ext}, requires=['Cython', 'numpy', 'matplotlib', 'pathos', 'scipy', 'cython.parallel']
)
Answer

The problem is the with gil:, which defines a block which can only run on one core at once. You aren't doing anything else inside the loop so you shouldn't really expect any speed-up.

In order to avoid using the GIL you need to avoid using Python features where possible. You avoid it in the string formatting part by using c sprintf to create your string. For the dictionary lookup part, the easiest thing is probably to use the C++ standard library, which contains a map class with similar behaviour. (Note that you'll now need to compile it with Cython's C++ mode)

# at the top of your file
from libc.stdio cimport sprintf
from libc.stdlib cimport malloc, free
from libcpp.map cimport map
from libcpp.string cimport string
import numpy as np
cimport numpy as np# ... code omitted  ....
cpdef MC_Surface(np.ndarray[np.int_t,ndim=3] volume, np.ndarray[np.float32_t,ndim=3] mc_vol):# note above I've defined volume as a numpy array so that# I can do fast, GIL-less direct array lookupcdef char* pattern # a string pointer - allocate laterPerm_area = {} # some dictionary, as before# depending on the size of Perm_area, this conversion to# a C++ object is potentially quite slow (it involves a lot# of string copies)cdef map[string,float] Perm_area_m = Perm_area# ... code omitted ...with nogil,parallel():try:# assigning pattern here makes it thread local# it's assigned once per thread which isn't too badpattern = <char*>malloc(sizeof(char)*50)# when you allocate pattern you need to make it big enough# either by calculating a size, or by just making it overly big# ... more code omitted...# then later, inside your loopssprintf(pattern, "%i%i%i%i%i%i%i%i", volume[i, j, k],volume[i, j + 1, k],volume[i + 1, j, k],volume[i + 1, j + 1, k],volume[i, j, k + 1],volume[i, j + 1, k + 1],volume[i + 1, j, k + 1],volume[i + 1, j + 1, k + 1]);# and now do the dictionary lookup without the GIL# because we're using the C++ class instead.# Unfortunately, we also need to do a string copy (which might slow things down)mc_vol[i, j, k] = Perm_area_m[string(pattern)]# be aware that this can throw an exception if the# pattern does not match (same as Python).finally:free(pattern)

I've also had to change volume to being a numpy array, since if it were just a Python object I'd need the GIL to index its elements.

(Edit: changed to take the dictionary lookup out of the GIL block too by using C++ map)

https://en.xdnf.cn/q/118184.html

Related Q&A

Is it possible (how) to add a spot color to pdf from matplotlib?

I am creating a chart which has to use (multiple) spot colors. This color could be one that is neither accessible from RGB nor CMYK. Is there a possibility to specify a spot color for a line in matplot…

Separate keywords and @ mentions from dataset

I have a huge set of data which has several columns and about 10k rows in more than 100 csv files, for now I am concerned about only one column with message format and from them I want to extract two p…

Kivy class in .py and .kv interaction 2

Follow up from Kivy class in .py and .kv interaction , but more complex. Here is the full code of what Im writing: The data/screens/learnkanji_want.kv has how I want the code to be, but I dont fully un…

How to centre an image in pygame? [duplicate]

This question already has an answer here:How to center an image in the middle of the window in pygame?(1 answer)Closed 1 year ago.I am using python 2.7, I would like to know if there is a way to centr…

widget in sub-window update with real-time data in tkinter python

Ive tried using the after/time.sleep to update the treeview, but it is not working with the mainloop. My questions are: How can I update the treeview widget with real-time data? And is there a way th…

Changing for loop to while loop

Wondering how would the following for loop be changed to while loop. While having the same output.for i in range(0,20, 4):print(i)

How to dynamically resize label in kivy without size attribute

So, I get that you can usually just use self(=)(:)texture_size (py,kv) but all of my widgets are either based on screen(root only) or size_hint. I am doing this on purpose for a cross-platform GUI. I o…

How to parallelize this nested loop in Python that calls Abaqus

I have the nested loops below. How can i parallelize the outside loop so i can distribute the outside loop into 4 simultaneous runs and wait for all 4 runs to complete before moving on with the rest of…

Comparing one column value to all columns in linux enviroment

So I have two files , one VCF that looks like88 Chr1 25 C - 3 2 1 1 88 Chr1 88 A T 7 2 1 1 88 Chr1 92 A C 16 4 1 1and another with genes that looks likeGENEI…

Can be saved into a variable one condition?

It would be possible to store the condition itself in the variable, rather than the immediate return it, when to declare it?Example:a = 3 b = 5x = (a == b) print(x)a = 5 print(x)The return isFalse Fal…