Mostly all examples of Numba, CuPy and etc available online are simple array additions, showing the speedup from going to cpu singles core/thread to a gpu. And commands documentations mostly lack good examples. This post is intended to provide a more comprehensive example.
The initial code is provided here. Its a simple model for the classic Cellular Automata. Originally, it doesn't even uses numpy, just plain python and the Pyglet module for visualization.
My goal is to extend this code to a specific problem (that will be very large), but first i thought its best to already optimize for GPU usage.
The game_of_life.py is this:
import random as rnd
import pyglet
#import numpy as np
#from numba import vectorize, cuda, jitclass GameOfLife: def __init__(self, window_width, window_height, cell_size, percent_fill):self.grid_width = int(window_width / cell_size) # cell_size self.grid_height = int(window_height / cell_size) # self.cell_size = cell_sizeself.percent_fill = percent_fillself.cells = []self.generate_cells()def generate_cells(self):for row in range(0, self.grid_height): self.cells.append([])for col in range(0, self.grid_width):if rnd.random() < self.percent_fill:self.cells[row].append(1)else:self.cells[row].append(0)def run_rules(self): temp = []for row in range(0, self.grid_height):temp.append([])for col in range(0, self.grid_width):cell_sum = sum([self.get_cell_value(row - 1, col),self.get_cell_value(row - 1, col - 1),self.get_cell_value(row, col - 1),self.get_cell_value(row + 1, col - 1),self.get_cell_value(row + 1, col),self.get_cell_value(row + 1, col + 1),self.get_cell_value(row, col + 1),self.get_cell_value(row - 1, col + 1)])if self.cells[row][col] == 0 and cell_sum == 3:temp[row].append(1)elif self.cells[row][col] == 1 and (cell_sum == 3 or cell_sum == 2):temp[row].append(1)else: temp[row].append(0)self.cells = tempdef get_cell_value(self, row, col): if row >= 0 and row < self.grid_height and col >= 0 and col < self.grid_width:return self.cells[row][col]return 0def draw(self): for row in range(0, self.grid_height):for col in range(0, self.grid_width):if self.cells[row][col] == 1:#(0, 0) (0, 20) (20, 0) (20, 20)square_coords = (row * self.cell_size, col * self.cell_size,row * self.cell_size, col * self.cell_size + self.cell_size,row * self.cell_size + self.cell_size, col * self.cell_size,row * self.cell_size + self.cell_size, col * self.cell_size + self.cell_size)pyglet.graphics.draw_indexed(4, pyglet.gl.GL_TRIANGLES,[0, 1, 2, 1, 2, 3],('v2i', square_coords))
Firstly, i could use numpy adding at the end of generate_cells
this self.cells = np.asarray(self.cells)
and at end of run_rules
this self.cells = np.asarray(temp)
, since doing this before wouldn't present speedups, as presented here.(Actually changing to numpy didn't present a noticeable speedup)
Regarding gpu's, for example, i added @jit
before every function, and became very slow.
Also tried to use @vectorize(['float32(float32, float32)'], target='cuda')
, but this raised a question: how to use @vectorize
in functions that only have self
as input argument?
I also tried substituting numpy for cupy, like self.cells = cupy.asarray(self.cells)
, but also became very slow.
Following the initial idea of a extended example of gpu usage, what would be the proper approach to the problem? Where is the right place to put the modifications/vectorizations/parallelizations/numba/cupy etc? And most important, why?
Additional info: besides the code provided, here's the main.py file:
import pyglet
from game_of_life import GameOfLife class Window(pyglet.window.Window):def __init__(self):super().__init__(800,800)self.gameOfLife = GameOfLife(self.get_size()[0],self.get_size()[1],15, # the lesser this value, more computation intensive will be0.5) pyglet.clock.schedule_interval(self.update, 1.0/24.0) # 24 frames per seconddef on_draw(self):self.clear()self.gameOfLife.draw()def update(self, dt):self.gameOfLife.run_rules()if __name__ == '__main__':window = Window()pyglet.app.run()