CUDA Function Wont Execute For Loop on Python with Numba

2024/9/20 9:39:34

I'm trying to run a simple update loop of a simulation on the GPU. Basically there are a bunch of "creatures" represented by circles that in each update loop will move and then there will be a check of whether any of them intersect.

import numpy as np
import math
from numba import cuda@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')
def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures, cycles):for c in range(cycles):for i in range(num_creatures):velocities[i] = velocities[i] + accelerationif velocities[i] > max_velocities[i]:velocities[i] = max_velocities[i]p_x[i] = p_x[i] + (math.cos(1.0) * velocities[i])p_y[i] = p_y[i] + (math.sin(1.0) * velocities[i])for i in range(num_creatures):for j in range(i, num_creatures):delta_x = p_x[j] - p_x[i]delta_y = p_y[j] - p_y[i]distance_squared = (delta_x * delta_x) + (delta_y * delta_y)sum_of_radii = radii[types[i]] + radii[types[i]]if distance_squared < sum_of_radii * sum_of_radii:passacceleration = .1
creature_radius = 10
spacing = 20
food_radius = 3max_num_creatures = 1500
num_creatures = 0
max_num_food = 500
num_food = 0
max_num_entities = max_num_creatures + max_num_food
num_entities = 0
cycles = 1p_x = np.empty((max_num_entities, 1), dtype=np.float32)
p_y = np.empty((max_num_entities, 1), dtype=np.float32)
radii = np.array([creature_radius, creature_radius, food_radius], dtype=np.float32)
types = np.empty((max_num_entities, 1), dtype=np.uint8)velocities = np.empty((max_num_creatures, 1), dtype=np.float32)
max_velocities = np.empty((max_num_creatures, 1), dtype=np.float32)
# types:
# male - 0
# female - 1
# food - 2
for x in range(1, 800 // spacing):for y in range(1, 600 // spacing):if num_creatures % 2 == 0:types[num_creatures] = 0else:types[num_creatures] = 1p_x[num_creatures] = x * spacingp_y[num_creatures] = y * spacingmax_velocities[num_creatures] = 5num_creatures += 1device_p_x = cuda.to_device(p_x)
device_p_y = cuda.to_device(p_y)
device_radii = cuda.to_device(radii)
device_types = cuda.to_device(types)
device_velocities = cuda.to_device(velocities)
device_max_velocities = cuda.to_device(max_velocities)
update(device_p_x, device_p_y, device_radii, device_types, device_velocities, device_max_velocities,acceleration, num_creatures, cycles)
print(device_p_x.copy_to_host()[0])

The 1.0 in math.cos and math.sin is just a placeholder for the directions of the individual creatures I have a surrounding loop executed cycles amount of times. If I try to remove it and only leave the block of code moving the creatures neither p_x, p_y or velocities have changed, even if I add a constant to them. Why not?

Answer

There are at least two problems:

  1. You aren't initializing velocities:

    velocities = np.empty((max_num_creatures, 1), dtype=np.float32)
    

    we can fix that for a trivial test with:

    velocities = np.ones((max_num_creatures, 1), dtype=np.float32)
    
  2. This isn't the correct array shape:

    p_x = np.empty((max_num_entities, 1), dtype=np.float32)^^^^^^^^^^^^^^^^^^^^^
    

    to match your kernel signature:

    @cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')^^^^^^^^^^
    

    we can fix that with:

    p_x = np.empty(max_num_entities, dtype=np.float32)
    

    and likewise for p_y, types, velocities, and max_velocities. (I imagine some change may possibly be in order also for radii, but it's not entirely clear what you intend with that, since it appears you want a multi-dimensional array, but are accessing it in-kernel as a single-dimensional array, AFAICT. Furthermore, that section of your kernel code is a do-nothing, so it is more or less irrelevant for the problem at hand).

When I make those changes, I get what appears to be rational output:

$ cat t9.py
import numpy as np
import math
from numba import cuda@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')
def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures, cycles):for c in range(cycles):for i in range(num_creatures):velocities[i] = velocities[i] + accelerationif velocities[i] > max_velocities[i]:velocities[i] = max_velocities[i]p_x[i] = p_x[i] + (math.cos(1.0) * velocities[i])p_y[i] = p_y[i] + (math.sin(1.0) * velocities[i])for i in range(num_creatures):for j in range(i, num_creatures):delta_x = p_x[j] - p_x[i]delta_y = p_y[j] - p_y[i]distance_squared = (delta_x * delta_x) + (delta_y * delta_y)sum_of_radii = radii[types[i]] + radii[types[i]]if distance_squared < sum_of_radii * sum_of_radii:passacceleration = .1
creature_radius = 10
spacing = 20
food_radius = 3max_num_creatures = 1500
num_creatures = 0
max_num_food = 500
num_food = 0
max_num_entities = max_num_creatures + max_num_food
num_entities = 0
cycles = 1p_x = np.empty(max_num_entities, dtype=np.float32)
p_y = np.empty(max_num_entities, dtype=np.float32)
radii = np.array([creature_radius, creature_radius, food_radius], dtype=np.float32)
types = np.empty(max_num_entities, dtype=np.uint8)velocities = np.ones(max_num_creatures, dtype=np.float32)
max_velocities = np.empty(max_num_creatures, dtype=np.float32)
# types:
# male - 0
# female - 1
# food - 2
for x in range(1, 800 // spacing):for y in range(1, 600 // spacing):if num_creatures % 2 == 0:types[num_creatures] = 0else:types[num_creatures] = 1p_x[num_creatures] = x * spacingp_y[num_creatures] = y * spacingmax_velocities[num_creatures] = 5num_creatures += 1device_p_x = cuda.to_device(p_x)
device_p_y = cuda.to_device(p_y)
device_radii = cuda.to_device(radii)
device_types = cuda.to_device(types)
device_velocities = cuda.to_device(velocities)
device_max_velocities = cuda.to_device(max_velocities)
update(device_p_x, device_p_y, device_radii, device_types, device_velocities, device_max_velocities,acceleration, num_creatures, cycles)
print(device_p_x.copy_to_host())
$ python t9.py
[  2.05943317e+01   2.05943317e+01   2.05943317e+01 ...,   3.64769361e-111.52645868e-19   1.80563260e+28]
$

Also note that currently you are only launching one block of one thread, but I assume that is not pertinent to your request, currently.

https://en.xdnf.cn/q/119346.html

Related Q&A

Implementing the Ceaser Cipher function through input in Python

Im trying to create a Ceaser Cipher function in Python that shifts letters based off the input you put in.plainText = input("Secret message: ") shift = int(input("Shift: "))def caes…

Twitter scraping of older tweets

I am doing a project in which I needed to get tweets from twitter, and I used the twitter API but it only gives tweets from 7-9 days old but I want a few months older tweets as well. So I decided to sc…

Bootstrap Navbar Logo not found

Hello I am trying to get my NavBar on bootstrap to show a Logo, I have tried moving the png to different folders in the app but I get this error: System check identified no issues (0 silenced). January…

Why camelcase not installed?

i try to install camelcase in my python project. pip install camelcase but when i want to use the package, pylance give me this error: Import "camelcase" could not be resolved Pylance (report…

Find the two longest strings from a list || or the second longest list in PYTHON

Id like to know how i can find the two longest strings from a list(array) of strings or how to find the second longest string from a list. thanks

Which tensorflow-gpu version is compatible with Python 3.7.3

Actually, I am tired of getting "ImportError: DLL load failed" inWindows 10 CUDA Toolkit 10.0 (Sept 2018) Download cuDNN v7.6.0 (May 20, 2019) / v7.6.4 tensorflow-gpu==1.13.1 / 1.13.2 / 1.14 …

Find valid strings [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 6…

What is the difference in *args, **kwargs vs calling with tuple and dict? [duplicate]

This question already has answers here:What does ** (double star/asterisk) and * (star/asterisk) do for parameters?(28 answers)Closed 8 years ago.This is a basic question. Is there a difference in doi…

Get result from multiprocessing process

I want to know if is there a way to make multiprocessing working in this code. What should I change or if there exist other function in multiprocessing that will allow me to do that operation.You can c…

How to do a second interpolation in python

I did my first interpolation with numpy.polyfit() and numpy.polyval() for 50 longitude values for a full satellite orbit.Now, I just want to look at a window of 0-4.5 degrees longitude and do a second …