Source code for mctsnc

"""
This module contains the core algorithmic functionalities of the project, embodied by the class ``MCTSNC``. 

With CUDA computational model in mind, we have proposed and implemented in class ``MCTSNC`` four, fast operating and thoroughly parallel, variants of Monte Carlo Tree Search algorithm. 
The provided implementation takes advantage of `Numba <https://numba.pydata.org>`_, a just-in-time Python compiler, and its ``numba.cuda`` package (hence the "NC" suffix in the name). 
By `thoroughly parallel` we understand an algorithmic design that applies to both: (1) the structural elements of trees - leaf-/root-/tree-level parallelization 
(all those three are combined), and (2) the stages of MCTS - each stage in itself (selection, expansion, playouts, backup) employs multiple GPU threads. 
We apply suitable `reduction` patterns to carry out summations or max / argmax operations. Cooperation of threads helps to transfer information between global and shared memory. 
The implementation uses: no atomic operations, no mutexes (lock-free), and very few device-host memory transfers. 

Example usage 1 (Connect 4)
---------------------------
Assume the mechanics of the Connect 4 game have been defined to MCTS-NC in ``mctsnc_game_mechanics.py`` (via device functions ``is_action_legal``, ``take_action``, etc.), 
and that ``c4`` - instance of ``C4(State)`` - represents a state of an ongoing Connect 4 game shown below.

.. code-block:: console

    |.|.|●|○|.|.|.|
    |.|.|●|○|.|.|○|
    |.|.|●|●|.|●|●|
    |.|●|○|●|.|○|●|
    |.|○|●|○|.|●|○|
    |○|○|○|●|●|○|○|
     0 1 2 3 4 5 6      

Then, running the following code

.. code-block:: python

    ai = MCTSNC(C4.get_board_shape(), C4.get_extra_info_memory(), C4.get_max_actions())
    ai.init_device_side_arrays()
    best_action = ai.run(c4.get_board(), c4.get_extra_info(), c4.get_turn())
    print(f"BEST ACTION: {best_action}")

results in finding the best action for black - move 4 (winning in two plies), and the following printout:

.. code-block:: console

    [MCTSNC._init_device_side_arrays()... for MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
    [MCTSNC._init_device_side_arrays() done; time: 0.5193691253662109 s, per_state_memory: 95 B,  calculated max_tree_size: 2825549]
    MCTSNC RUN... [MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
    [actions info:
    {
      0: {'name': '0', 'n_root': 7474304, 'win_flag': False, 'n': 2182400, 'n_wins': 2100454, 'q': 0.9624514296187683, 'ucb': 0.9678373740384631},
      1: {'name': '1', 'n_root': 7474304, 'win_flag': False, 'n': 185344, 'n_wins': 164757, 'q': 0.8889254575276243, 'ucb': 0.9074070665330406},
      4: {'name': '4', 'n_root': 7474304, 'win_flag': False, 'n': 4921472, 'n_wins': 4885924, 'q': 0.9927769577882389, 'ucb': 0.9963635461474457},
      5: {'name': '5', 'n_root': 7474304, 'win_flag': False, 'n': 105472, 'n_wins': 91863, 'q': 0.8709704945388349, 'ucb': 0.8954701768685893},
      6: {'name': '6', 'n_root': 7474304, 'win_flag': False, 'n': 79616, 'n_wins': 68403, 'q': 0.8591614750803859, 'ucb': 0.8873601607647162},
      best: {'index': 4, 'name': '4', 'n_root': 7474304, 'win_flag': False, 'n': 4921472, 'n_wins': 4885924, 'q': 0.9927769577882389, 'ucb': 0.9963635461474457}
    }]
    [performance info:
    {
      steps: 6373,
      steps_per_second: 1274.0076324260813,
      playouts: 7474304,
      playouts_per_second: 1494166.0666990099,
      times_[ms]: {'total': 5002.324819564819, 'loop': 5000.642776489258, 'reduce_over_trees': 0.29015541076660156, 'reduce_over_actions': 0.4520416259765625, 'mean_loop': 0.7846607212441955, 'mean_select': 0.11222893376562147, 'mean_expand': 0.2786097114284054, 'mean_playout': 0.17186361935680036, 'mean_backup': 0.2193056618645448},
      trees: {'count': 8, 'mean_depth': 5.176703163017032, 'max_depth': 12, 'mean_size': 1233.0, 'max_size': 2736}
    }]
    MCTSNC RUN DONE. [time: 5.002324819564819 s; best action: 4, best win_flag: False, best n: 4921472, best n_wins: 4885924, best q: 0.9927769577882389]
    BEST ACTION: 4

Example usage 2 (Gomoku)
------------------------
Assume the mechanics of the Gomoku game have been defined to MCTS-NC in ``mctsnc_game_mechanics.py`` (via device functions ``is_action_legal``, ``take_action``, etc.), 
and that ``g`` - instance of ``Gomoku(State)`` - represents a state of an ongoing Gomoku game shown below.

.. code-block:: console

      ABCDEFGHIJKLMNO
    15+++++++++++++++15
    14+++++++++++++++14
    13+++++++++++++++13
    12++++++++●++++++12
    11++++++++○++++++11
    10++++++++○++++++10
     9++++++○+○++++++9
     8+++++++●○++++++8
     7+++++++●●●○++++7
     6++++++++●●○++++6
     5+++++++●+++++++5
     4+++++++++++++++4
     3+++++++++++++++3
     2+++++++++++++++2
     1+++++++++++++++1
      ABCDEFGHIJKLMNO
  
Then, running the following code

.. code-block:: python

    ai = MCTSNC(Gomoku.get_board_shape(), Gomoku.get_extra_info_memory(), Gomoku.get_max_actions(), action_index_to_name_function=Gomoku.action_index_to_name)
    ai.init_device_side_arrays()
    best_action = ai.run(g.get_board(), g.get_extra_info(), g.get_turn())
    print(f"BEST ACTION: {best_action}")

results in finding the defensive action for white - move K8 (indexed as 115) that prevents black from winning in three plies, and the following printout:

.. code-block:: console

    [MCTSNC._init_device_side_arrays()... for MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
    [MCTSNC._init_device_side_arrays() done; time: 0.5558419227600098 s, per_state_memory: 1144 B,  calculated max_tree_size: 234637]
    MCTSNC RUN... [MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
    [actions info:
    {
      0: {'name': 'A1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 148906, 'q': 0.3478852048444976, 'ucb': 0.36098484108863044},
      1: {'name': 'B1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 149000, 'q': 0.34810481459330145, 'ucb': 0.3612044508374343},
      2: {'name': 'C1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 144339, 'q': 0.3372154418361244, 'ucb': 0.35031507808025725},
      ...
      115: {'name': 'K8', 'n_root': 94359552, 'win_flag': False, 'n': 1093632, 'n_wins': 452284, 'q': 0.41356141736891383, 'ucb': 0.4217566587685248},
      ...
      222: {'name': 'M15', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 148009, 'q': 0.34578956713516745, 'ucb': 0.3588892033793003},
      223: {'name': 'N15', 'n_root': 94359552, 'win_flag': False, 'n': 401408, 'n_wins': 148802, 'q': 0.37070013552295916, 'ucb': 0.38422722440183954},
      224: {'name': 'O15', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 145329, 'q': 0.3395283530203349, 'ucb': 0.35262798926446776},
      best: {'index': 115, 'name': 'K8', 'n_root': 94359552, 'win_flag': False, 'n': 1093632, 'n_wins': 452284, 'q': 0.41356141736891383, 'ucb': 0.4217566587685248}
    }]
    [performance info:
    {
      steps: 442,
      steps_per_second: 88.25552729358726,
      playouts: 94359552,
      playouts_per_second: 18841067.91164404,
      times_[ms]: {'total': 5008.184909820557, 'loop': 5006.503105163574, 'reduce_over_trees': 0.20575523376464844, 'reduce_over_actions': 0.5161762237548828, 'mean_loop': 11.326930102180032, 'mean_select': 0.10066766005295974, 'mean_expand': 0.3082833139065704, 'mean_playout': 10.688265524298897, 'mean_backup': 0.226746317488036},
      trees: {'count': 8, 'mean_depth': 2.519115779878241, 'max_depth': 3, 'mean_size': 92149.0, 'max_size': 92149}
    }]
    MCTSNC RUN DONE. [time: 5.008184909820557 s; best action: 115 (K8), best win_flag: False, best n: 1093632, best n_wins: 452284, best q: 0.41356141736891383]
    BEST ACTION: 115
     
Dependencies
------------
- ``numpy``, ``math``: required for mathematical computations.

- ``numba``: required for just-in-time compilation of CUDA kernels (decorated by ``@cuda.jit``).

- ``mctsnc_game_mechanics``: required to define the mechanics of a wanted game or search problem via a set of five device-side functions - ``is_action_legal``, ``take_action``, ``legal_actions_playout``, ``take_action_playout``, ``compute_outcome`` callable by kernel functions of ``MCTSNC`` (see :doc:`mctsnc_game_mechanics`). 

- For usage of ``MCTSNC`` class, NVIDIA CUDA drivers must be present in the operating system. 

Link to project repository
--------------------------
`https://github.com/pklesk/mcts_numba_cuda <https://github.com/pklesk/mcts_numba_cuda>`_

Notes
-----
Private functions of ``MCTSNC`` class are named with a single leading underscore (e.g.: ``_set_cuda_constants``, 
``_make_performance_info``, ``_playout_acp_prodigal``, etc.). Among them, the kernel functions are additionally 
described by ``@cuda.jit`` decorators coming from ``numba`` module. Exact specifications of types come along with the decorators.
For public methods full docstrings are provided (with arguments and returns described). For private functions short docstrings are provided.    

"""

import numpy as np
from numpy import inf
from numba import cuda
from numba import void, int8, int16, int32, int64, float32, boolean
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32, xoroshiro128p_type 
import time
import math
from numba.core.errors import NumbaPerformanceWarning
import warnings
from mctsnc_game_mechanics import is_action_legal, take_action, legal_actions_playout, take_action_playout, compute_outcome
from utils import dict_to_str
import json

__version__ = "1.0.0"
__author__ = "Przemysław Klęsk"
__email__ = "pklesk@zut.edu.pl" 

warnings.simplefilter("ignore", category=NumbaPerformanceWarning)

# the class
[docs] class MCTSNC: """ Monte Carlo Tree Search implemented via ``numba.cuda`` meant for multi-threaded executions on GPU involving multiple concurrent trees and playouts (four algorithmic variants available). """ # constants VARIANTS = ["ocp_thrifty", "ocp_prodigal", "acp_thrifty", "acp_prodigal"] # ocp - one child playouts, acp - all children playouts; thrifty/prodigal - accurate/overhead usage of cuda blocks (pertains to expanded actions) DEFAULT_SEARCH_TIME_LIMIT = 5.0 # [s], np.inf possible DEFAULT_SEARCH_STEPS_LIMIT = np.inf # integer, np.inf possible DEFAULT_N_TREES = 8 DEFAULT_N_PLAYOUTS = 128 DEFAULT_VARIANT = VARIANTS[-1] DEFAULT_DEVICE_MEMORY = 2.0 DEFAULT_UCB_C = 2.0 DEFAULT_SEED = 0 DEFAULT_VERBOSE_DEBUG = False DEFAULT_VERBOSE_INFO = True MAX_STATE_BOARD_SHAPE = (32, 32) MAX_STATE_EXTRA_INFO_MEMORY = 4096 MAX_STATE_MAX_ACTIONS = 512 MAX_TREE_SIZE = 2**24 MAX_N_TREES = 512 MAX_N_PLAYOUTS = 512 MAX_TREE_DEPTH = 2048 # to memorize paths at select stage
[docs] def __init__(self, state_board_shape, state_extra_info_memory, state_max_actions, search_time_limit=DEFAULT_SEARCH_TIME_LIMIT, search_steps_limit=DEFAULT_SEARCH_STEPS_LIMIT, n_trees=DEFAULT_N_TREES, n_playouts=DEFAULT_N_PLAYOUTS, variant=DEFAULT_VARIANT, device_memory=DEFAULT_DEVICE_MEMORY, ucb_c=DEFAULT_UCB_C, seed=DEFAULT_SEED, verbose_debug=DEFAULT_VERBOSE_DEBUG, verbose_info=DEFAULT_VERBOSE_INFO, action_index_to_name_function=None): """ Constructor of ``MCTSNC`` instances. Args: state_board_shape (tuple(int, int)): shape of board for states in a given game, at most ``(32, 32)``. state_extra_info_memory (int): number of bytes for extra information on states, at most ``4096``. state_max_actions (int): maximum branching factor, at most ``512``. search_time_limit (float): time limit in seconds (computational budget), ``np.inf`` if no limit, defaults to ``5.0``. search_steps_limit (float): steps limit (computational budget), ``np.inf`` if no limit, defaults to ``np.inf``. n_trees (int): number of independent trees, defaults to ``8``. n_playouts (int): number of independent playouts from an expanded child, must be a power of 2, defaults to ``128``. variant (str): choice of algorithmic variant from {``"ocp_thrifty"``, ``"ocp_prodigal"``, ``"acp_thrifty``, ``"acp_prodigal``}, defaults to ``"acp_prodigal"``. device_memory (float): GPU memory in GiBs (gibibytes) to be available for this instance, defaults to ``2.0``. ucb_c (float): value of C constant, influencing exploration tendency, appearing in UCT formula (upper confidence bounds for trees), defaults to ``2.0``. verbose_debug (bool): debug verbosity flag, if ``True`` then detailed information about each kernel invocation are printed to console (in each iteration), defaults to ``False``. verbose_info (bool): verbosity flag, if ``True`` then standard information on actions and performance are printed to console (after a full run), defaults to ``True``. action_index_to_name_function (callable): pointer to user-provided function converting action indexes to a human-friendly names (e.g. ``"e2:e4"`` for chess), defaults to ``None``. """ self._set_cuda_constants() if not self.cuda_available: sys.exit(f"[MCTSNC.__init__(): exiting due to cuda computations not available]") self.state_board_shape = state_board_shape if self.state_board_shape[0] > self.MAX_STATE_BOARD_SHAPE[0] or self.state_board_shape[1] > self.MAX_STATE_BOARD_SHAPE[1]: sys.exit(f"[MCTSNC.__init__(): exiting due to allowed state board shape exceeded]") self.state_extra_info_memory = max(state_extra_info_memory, 1) if self.state_extra_info_memory > self.MAX_STATE_EXTRA_INFO_MEMORY: sys.exit(f"[MCTSNC.__init__(): exiting due to allowed state extra info memory exceeded]") self.state_max_actions = state_max_actions if self.state_max_actions > self.MAX_STATE_MAX_ACTIONS: sys.exit(f"[MCTSNC.__init__(): exiting due to allowed state max actions memory exceeded]") if self.state_max_actions > self.cuda_tpb_default: sys.exit(f"[MCTSNC.__init__(): exiting due to state max actions exceeding half of cuda default tpb]") self.search_time_limit = search_time_limit self._validate_param("search_time_limit", float, True, 0.0, False, np.inf, self.DEFAULT_SEARCH_TIME_LIMIT) self.search_steps_limit = float(search_steps_limit) self._validate_param("search_steps_limit", float, True, 0.0, False, np.inf, self.DEFAULT_SEARCH_STEPS_LIMIT) # purposely float, so that np.inf possible self.n_trees = n_trees self._validate_param("n_trees", int, False, 1, False, self.MAX_N_TREES, self.DEFAULT_N_TREES) if 2**np.round(np.log2(n_playouts)) != n_playouts: invalid_n_playouts = n_playouts n_playouts = self.DEFAULT_N_PLAYOUTS print(f"[n_playouts: {invalid_n_playouts} is not a power of 2; changed to default: {n_playouts}]") self.n_playouts = n_playouts self._validate_param("n_playouts", int, False, 1, False, self.MAX_N_PLAYOUTS, self.DEFAULT_N_PLAYOUTS) if not variant in self.VARIANTS: invalid_variant = variant variant = self.DEFAULT_VARIANT print(f"[invalid variant: '{invalid_variant}' changed to default: '{variant}'; possible variants: {self.VARIANTS}]") self.variant = variant self.ucb_c = ucb_c self._validate_param("ucb_c", float, False, 0.0, False, np.inf, self.DEFAULT_UCB_C) self.device_memory = device_memory * 1024**3 # gibibytes (GiB) to bytes (B) self._validate_param("device_memory", float, True, 0.0, False, np.inf, self.DEFAULT_DEVICE_MEMORY) self.seed = seed self.verbose_debug = verbose_debug self._validate_param("verbose_debug", bool, False, False, False, True, self.DEFAULT_VERBOSE_DEBUG) self.verbose_info = verbose_info self._validate_param("verbose_info", bool, False, False, False, True, self.DEFAULT_VERBOSE_INFO) self.action_index_to_name_function = action_index_to_name_function
[docs] def _set_cuda_constants(self): """Investigates (via ``numba`` module) if CUDA-based computations are available and, if so, sets suitable constants.""" self.cuda_available = cuda.is_available() self.cuda_tpb_default = cuda.get_current_device().MAX_THREADS_PER_BLOCK // 2 if self.cuda_available else None
[docs] def _validate_param(self, name, ptype, leq, low, geq, high, default): """Validates a parameter - is it of correct type and within given range (either end of the range can be open or closed).""" value = getattr(self, name) invalid = value <= low if leq else value < low if not invalid: invalid = value >= high if geq else value > high if not invalid: invalid = not isinstance(value, ptype) if invalid: low_str = str(low) high_str = str(high) correct_range_str = ("(" if leq else "[") + f"{low_str}, {high_str}" + (")" if geq else "]") setattr(self, name, default) print(f"[invalid param {name}: {value} changed to default: {default}; correct range: {correct_range_str}, correct type: {ptype}]")
[docs] def __str__(self): """ Returns a string representation of this ``MCTSNC`` instance. Returns: str: string representation of this ``MCTSNC`` instance. """ return f"MCTSNC(search_time_limit={self.search_time_limit}, search_steps_limit={self.search_steps_limit}, n_trees={self.n_trees}, n_playouts={self.n_playouts}, variant='{self.variant}', device_memory={np.round(self.device_memory / 1024**3, 2)}, ucb_c={self.ucb_c}, seed: {self.seed})"
[docs] def __repr__(self): """ Returns a detailed string representation of this ``MCTSNC`` instance. Returns: str: detailed string representation of this ``MCTSNC`` instance. """ repr_str = f"{str(self)}, " repr_str += f"state_board_shape={self.state_board_shape}, state_extra_info_memory={self.state_extra_info_memory}, state_max_actions={self.state_max_actions})" return repr_str
[docs] def init_device_side_arrays(self): """ Allocates all the necessary device arrays based on relevant constants and available memory. """ if self.verbose_info: print(f"[MCTSNC._init_device_side_arrays()... for {self}]") t1_dev_arrays = time.time() # dtypes node_index_dtype = np.int32 node_index_bytes = node_index_dtype().itemsize # 4 B action_index_dtype = np.int16 action_index_bytes = action_index_dtype().itemsize # 2 B board_element_dtype = np.int8 board_element_bytes = board_element_dtype().itemsize # 1 B extra_info_element_dtype = np.int8 extra_info_element_bytes = extra_info_element_dtype().itemsize # 1 B depth_dtype = np.int16 depth_bytes = depth_dtype().itemsize # 2 B size_dtype = np.int32 size_bytes = size_dtype().itemsize # 4 B turn_dtype = np.int8 turn_bytes = turn_dtype().itemsize # 1 B flag_dtype = bool flag_bytes = 1 # 1 B outcome_dtype = np.int8 outcome_bytes = outcome_dtype().itemsize # 1 B playout_outcomes_dtype = np.int32 playout_outcomes_bytes = playout_outcomes_dtype().itemsize # 4 B ns_dtype = np.int32 ns_bytes = ns_dtype().itemsize # 4 B ns_extended_dtype = np.int64 # memory related calculations per_state_additional_memory = depth_bytes + turn_bytes + 2 * flag_bytes + outcome_bytes + 2 * ns_bytes # depth, turn, leaf, terminal, ouctome, ns, ns_wins per_tree_additional_memory = size_bytes + node_index_bytes + action_index_bytes * (self.state_max_actions + 2) + playout_outcomes_bytes * 2 \ + node_index_bytes * (self.MAX_TREE_DEPTH + 2) # tree size, tree node selected, tree actions expanded * (self.state_max_actions + 2), playout outcomes * 2, selected path if "acp" in self.variant: # playout all children per_tree_additional_memory += playout_outcomes_bytes * self.state_max_actions * 2 # playout children outcomes per_state_memory = board_element_bytes * np.prod(self.state_board_shape) + extra_info_element_bytes * self.state_extra_info_memory \ + node_index_bytes * (1 + self.state_max_actions) + per_state_additional_memory # board, extra info, tree array entry (parent, children nodes), additional memory self.max_tree_size = (int(self.device_memory) - self.n_trees * per_tree_additional_memory) // (per_state_memory * self.n_trees) self.max_tree_size = min(self.max_tree_size, self.MAX_TREE_SIZE) # tpb tpb_board = int(2**np.ceil(np.log2(np.prod(self.state_board_shape)))) tpb_extra_info = int(2**np.ceil(np.log2(self.state_extra_info_memory))) if self.state_extra_info_memory > 0 else 1 tpb_max_actions = int(2**np.ceil(np.log2(self.state_max_actions))) self.tpb_r = min(max(tpb_board, tpb_extra_info), self.cuda_tpb_default) self.tpb_s = self.cuda_tpb_default self.tpb_e1 = min(max(self.tpb_r, tpb_max_actions), self.cuda_tpb_default) self.tpb_e2 = self.tpb_r self.tpb_b1 = tpb_max_actions self.tpb_b2 = self.cuda_tpb_default self.tpb_rot = int(2**np.ceil(np.log2(self.n_trees))) # rot - reduce over trees self.tpb_roa = tpb_max_actions # roa - reduce over actions # device arrays self.dev_trees = cuda.device_array((self.n_trees, self.max_tree_size, 1 + self.state_max_actions), dtype=node_index_dtype) # each row of a tree represents a node consisting of: parent indexes and indexes of all children (associated with actions), -1 index for none parent or child self.dev_trees_sizes = cuda.device_array(self.n_trees, dtype=size_dtype) self.dev_trees_depths = cuda.device_array((self.n_trees, self.max_tree_size), dtype=depth_dtype) self.dev_trees_turns = cuda.device_array((self.n_trees, self.max_tree_size), dtype=turn_dtype) self.dev_trees_leaves = cuda.device_array((self.n_trees, self.max_tree_size), dtype=flag_dtype) self.dev_trees_terminals = cuda.device_array((self.n_trees, self.max_tree_size), dtype=flag_dtype) self.dev_trees_outcomes = cuda.device_array((self.n_trees, self.max_tree_size), dtype=outcome_dtype) self.dev_trees_ns = cuda.device_array((self.n_trees, self.max_tree_size), dtype=ns_dtype) self.dev_trees_ns_wins = cuda.device_array((self.n_trees, self.max_tree_size), dtype=ns_dtype) self.dev_trees_boards = cuda.device_array((self.n_trees, self.max_tree_size, self.state_board_shape[0], self.state_board_shape[1]), dtype=board_element_dtype) self.dev_trees_extra_infos = cuda.device_array((self.n_trees, self.max_tree_size, self.state_extra_info_memory), dtype=extra_info_element_dtype) self.dev_trees_nodes_selected = cuda.device_array(self.n_trees, dtype=node_index_dtype) self.dev_trees_selected_paths = cuda.device_array((self.n_trees, self.MAX_TREE_DEPTH + 2), dtype=node_index_dtype) self.dev_trees_actions_expanded = cuda.device_array((self.n_trees, self.state_max_actions + 2), dtype=action_index_dtype) # +2 because 2 last entries inform about: child picked randomly for playouts, number of actions (children) expanded self.dev_trees_playout_outcomes = cuda.device_array((self.n_trees, 2), dtype=playout_outcomes_dtype) # each row stores counts of: -1 wins and +1 wins, respectively (for given tree) self.dev_trees_playout_outcomes_children = None self.dev_random_generators_expand_1 = None self.dev_random_generators_playout = None if "ocp" in self.variant: self.dev_random_generators_expand_1 = create_xoroshiro128p_states(self.n_trees * self.tpb_e1, seed=self.seed) self.dev_random_generators_playout = create_xoroshiro128p_states(self.n_trees * self.n_playouts, seed=self.seed) else: # "acp" self.dev_random_generators_playout = create_xoroshiro128p_states(self.n_trees * self.state_max_actions * self.n_playouts, seed=self.seed) self.dev_trees_playout_outcomes_children = cuda.device_array((self.n_trees, self.state_max_actions, 2), dtype=playout_outcomes_dtype) # for each (playable) action, each row stores counts of: -1 wins and +1 wins, respectively (for given tree) self.dev_root_actions_expanded = cuda.device_array(self.state_max_actions + 2, dtype=action_index_dtype) self.dev_root_ns = cuda.device_array(self.state_max_actions, dtype=ns_extended_dtype) # all entries the same regardless of root action (overhead for convenience) self.dev_actions_win_flags = cuda.device_array(self.state_max_actions, dtype=flag_dtype) self.dev_actions_ns = cuda.device_array(self.state_max_actions, dtype=ns_extended_dtype) self.dev_actions_ns_wins = cuda.device_array(self.state_max_actions, dtype=ns_extended_dtype) self.dev_best_action = cuda.device_array(1, dtype=action_index_dtype) self.dev_best_win_flag = cuda.device_array(1, dtype=flag_dtype) self.dev_best_n = cuda.device_array(1, dtype=ns_extended_dtype) self.dev_best_n_wins = cuda.device_array(1, dtype=ns_extended_dtype) t2_dev_arrays = time.time() if self.verbose_info: print(f"[MCTSNC._init_device_side_arrays() done; time: {t2_dev_arrays - t1_dev_arrays} s, per_state_memory: {per_state_memory} B, calculated max_tree_size: {self.max_tree_size}]")
[docs] def run(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf): """ Runs the Monte Carlo Tree Search on GPU involving multiple concurrent trees and playouts. Computations are carried out according to the formerly chosen algorithmic variant, i.e. one of {``"ocp_thrifty"``, ``"ocp_prodigal"``, ``"acp_thrifty``, ``"acp_prodigal``}, defaults to ``"acp_prodigal"``}. Args: root_board (ndarray): two-dimensional array with board (or other representation) of root state from which the search starts. root_extra_info (ndarray): any additional information of root state not implied by the contents of the board itself (e.g. possibilities of castling or en-passant captures in chess, the contract in double dummy bridge, etc.), or technical information useful to generate legal actions faster. root_turn {-1, 1}: indicator of the player, minimizing or maximizing, to act first at root state. forced_search_steps_limit (int): steps limit used only when reproducing results of a previous experiment; if less than``np.inf`` then has a priority over the standard computational budget given by ``search_time_limit`` and ``search_steps_limit``. Returns: self.best_action (int): best action resulting from search. """ print(f"MCTSNC RUN... [{self}]") run_method = getattr(self, "_run_" + self.variant) run_method(root_board, root_extra_info, root_turn, forced_search_steps_limit) best_action_label = str(self.best_action) if self.action_index_to_name_function is not None: best_action_label += f" ({self.action_index_to_name_function(self.best_action)})" print(f"MCTSNC RUN DONE. [time: {self.time_total} s; best action: {best_action_label}, best win_flag: {self.best_win_flag}, best n: {self.best_n}, best n_wins: {self.best_n_wins}, best q: {self.best_q}]") return self.best_action
[docs] def _flatten_trees_actions_expanded_thrifty(self, trees_actions_expanded): """Uses information from array ``trees_actions_expanded`` of shape ``(self.n_trees, self.state_max_actions + 2)`` and converts it to another array where the number of rows corresponds to the total of expanded legal actions in all trees. Each row contains a pair of indexes for: action and tree. The approach allows to allocate exact number of needed CUDA blocks for further operations.""" actions_expanded_cumsum = np.cumsum(trees_actions_expanded[:, -1]) trees_actions_expanded_flat = -np.ones((actions_expanded_cumsum[-1], 2), dtype=np.int16) shift = 0 for ti in range(self.n_trees): s = slice(shift, actions_expanded_cumsum[ti]) trees_actions_expanded_flat[s, 0] = ti trees_actions_expanded_flat[s, 1] = trees_actions_expanded[ti, :trees_actions_expanded[ti, -1]] shift = actions_expanded_cumsum[ti] return trees_actions_expanded_flat
[docs] def _make_performance_info(self): """ Prepares and returns a dictionary with information on performance during the last run. After the call, available via ``performance_info`` attribute. """ performance_info = {} performance_info["steps"] = int(self.steps) performance_info["steps_per_second"] = self.steps / self.time_total root_ns = self.dev_root_ns.copy_to_host() playouts = root_ns[root_ns > 0][0] performance_info["playouts"] = int(playouts) performance_info["playouts_per_second"] = performance_info["playouts"] / self.time_total ms_factor = 10.0**3 times_info = {} times_info["total"] = ms_factor * self.time_total times_info["loop"] = ms_factor * self.time_loop times_info["reduce_over_trees"] = ms_factor * self.time_reduce_over_trees times_info["reduce_over_actions"] = ms_factor * self.time_reduce_over_actions times_info["mean_loop"] = times_info["loop"] / self.steps times_info["mean_select"] = ms_factor * self.time_select / self.steps times_info["mean_expand"] = ms_factor * self.time_expand / self.steps times_info["mean_playout"] = ms_factor * self.time_playout / self.steps times_info["mean_backup"] = ms_factor * self.time_backup / self.steps performance_info["times_[ms]"] = times_info trees_depths = np.empty_like(self.dev_trees_depths) trees_sizes = np.empty_like(self.dev_trees_sizes) self.dev_trees_depths.copy_to_host(ary=trees_depths) self.dev_trees_sizes.copy_to_host(ary=trees_sizes) mean_depth = 0 max_depth = -1 for i in range(self.n_trees): depths_up_to_size = trees_depths[i, :trees_sizes[i]] mean_depth += np.sum(depths_up_to_size) max_depth = max(max_depth, np.max(depths_up_to_size)) total_size = np.sum(trees_sizes) max_size = np.max(trees_sizes) mean_size = total_size / self.n_trees mean_depth /= total_size trees_info = {} trees_info["count"] = int(self.n_trees) trees_info["mean_depth"] = mean_depth trees_info["max_depth"] = int(max_depth) trees_info["mean_size"] = mean_size trees_info["max_size"] = int(max_size) performance_info["trees"] = trees_info self.performance_info = performance_info return performance_info
[docs] def _make_actions_info_thrifty(self): """ Prepares and returns a dictionary with information on root actions (using thrifty indexing) implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc. After the call, available via ``actions_info`` attribute. """ root_actions_expanded = np.empty_like(self.dev_root_actions_expanded) root_ns_thrifty = np.empty_like(self.dev_root_ns) actions_win_flags_thrifty = np.empty_like(self.dev_actions_win_flags) actions_ns_thrifty = np.empty_like(self.dev_actions_ns) actions_ns_wins_thrifty = np.empty_like(self.dev_actions_ns_wins) self.dev_root_actions_expanded.copy_to_host(ary=root_actions_expanded) self.dev_root_ns.copy_to_host(ary=root_ns_thrifty) self.dev_actions_win_flags.copy_to_host(ary=actions_win_flags_thrifty) self.dev_actions_ns.copy_to_host(ary=actions_ns_thrifty) self.dev_actions_ns_wins.copy_to_host(ary=actions_ns_wins_thrifty) actions_info = {} best_entry = None n_root_actions = root_actions_expanded[-1] for i in range(n_root_actions): entry = {} a = root_actions_expanded[i] # for thrifty variants entry["name"] = self.action_index_to_name_function(a) if self.action_index_to_name_function else str(a) entry["n_root"] = int(root_ns_thrifty[i]) entry["win_flag"] = bool(actions_win_flags_thrifty[i]) entry["n"] = int(actions_ns_thrifty[i]) entry["n_wins"] = int(actions_ns_wins_thrifty[i]) entry["q"] = entry["n_wins"] / entry["n"] if entry["n"] > 0 else np.nan entry["ucb"] = entry["q"] + self.ucb_c * np.sqrt(np.log(entry["n_root"]) / entry["n"]) if entry["n"] > 0 else np.inf actions_info[a] = entry if a == self.best_action: best_entry = {"index": int(a), **entry} actions_info["best"] = best_entry self.actions_info = actions_info return actions_info
[docs] def _make_actions_info_prodigal(self): """ Prepares and returns a dictionary with information on root actions (using prodigal indexing) implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc. After the call, available via ``actions_info`` attribute. """ root_ns_prodigal = np.empty_like(self.dev_root_ns) actions_win_flags_prodigal = np.empty_like(self.dev_actions_win_flags) actions_ns_prodigal = np.empty_like(self.dev_actions_ns) actions_ns_wins_prodigal = np.empty_like(self.dev_actions_ns_wins) self.dev_root_ns.copy_to_host(ary=root_ns_prodigal) self.dev_actions_win_flags.copy_to_host(ary=actions_win_flags_prodigal) self.dev_actions_ns.copy_to_host(ary=actions_ns_prodigal) self.dev_actions_ns_wins.copy_to_host(ary=actions_ns_wins_prodigal) actions_info = {} best_entry = None for i in range(self.state_max_actions): if root_ns_prodigal[i] == 0: continue a = i # for prodigal variants entry = {} entry["name"] = self.action_index_to_name_function(a) if self.action_index_to_name_function else str(a) entry["n_root"] = int(root_ns_prodigal[i]) entry["win_flag"] = bool(actions_win_flags_prodigal[i]) entry["n"] = int(actions_ns_prodigal[i]) entry["n_wins"] = int(actions_ns_wins_prodigal[i]) entry["q"] = entry["n_wins"] / entry["n"] if entry["n"] > 0 else np.nan entry["ucb"] = entry["q"] + self.ucb_c * np.sqrt(np.log(entry["n_root"]) / entry["n"]) if entry["n"] > 0 else np.inf actions_info[a] = entry if a == self.best_action: best_entry = {"index": int(a), **entry} actions_info["best"] = best_entry self.actions_info = actions_info return actions_info
[docs] def _run_ocp_thrifty(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf): """Runs computations for algorithmic variant: ``"ocp_thrifty"``.""" t1 = time.time() # reset t1_reset = time.time() bpg = self.n_trees tpb = self.tpb_r dev_root_board = cuda.to_device(root_board) if root_extra_info is None: root_extra_info = np.zeros(1, dtype=np.int8) # fake extra info array dev_root_extra_info = cuda.to_device(root_extra_info) if self.verbose_debug: print(f"[MCTSNC._reset()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reset[bpg, tpb](dev_root_board, dev_root_extra_info, root_turn, self.dev_trees, self.dev_trees_sizes, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_boards, self.dev_trees_extra_infos) cuda.synchronize() t2_reset = time.time() if self.verbose_debug: print(f"[MCTSNC._reset() done; time: {t2_reset - t1_reset} s]") self.time_select = 0.0 self.time_expand = 0.0 self.time_playout = 0.0 self.time_backup = 0.0 self.steps = 0 trees_actions_expanded = np.empty((self.n_trees, self.state_max_actions + 2), dtype=np.int16) # needed at host side for thrifty variants t1_loop = time.time() while True: t2_loop = time.time() if forced_search_steps_limit < np.inf: if self.steps >= forced_search_steps_limit: break elif self.steps >= self.search_steps_limit or t2_loop - t1_loop >= self.search_time_limit: break if self.verbose_debug: print(f"[step: {self.steps + 1} starting, time used so far: {t2_loop - t1_loop} s]") # selections t1_select = time.time() bpg = self.n_trees tpb = self.tpb_s if self.verbose_debug: print(f"[MCTSNC._select()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._select[bpg, tpb](self.ucb_c, self.dev_trees, self.dev_trees_leaves, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_nodes_selected, self.dev_trees_selected_paths) cuda.synchronize() t2_select = time.time() if self.verbose_debug: print(f"[MCTSNC._select() done; time: {t2_select - t1_select} s]") self.time_select += t2_select - t1_select # expansions t1_expand = time.time() t1_expand_1 = time.time() bpg = self.n_trees tpb = self.tpb_e1 if self.verbose_debug: print(f"[MCTSNC._expand_1_ocp_thrifty()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._expand_1_ocp_thrifty[bpg, tpb](self.max_tree_size, self.dev_trees, self.dev_trees_sizes, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_random_generators_expand_1, self.dev_trees_actions_expanded) self.dev_trees_actions_expanded.copy_to_host(ary=trees_actions_expanded) cuda.synchronize() if self.steps == 0: MCTSNC._memorize_root_actions_expanded[1, self.state_max_actions + 2](self.dev_trees_actions_expanded, self.dev_root_actions_expanded) cuda.synchronize() t2_expand_1 = time.time() if self.verbose_debug: print(f"[MCTSNC._expand_1_ocp_thrifty() done; time: {t2_expand_1 - t1_expand_1} s]") t1_expand_2 = time.time() trees_actions_expanded_flat = self._flatten_trees_actions_expanded_thrifty(trees_actions_expanded) bpg = trees_actions_expanded_flat.shape[0] # thrifty number of blocks tpb = self.tpb_e2 if self.verbose_debug: print(f"[MCTSNC._expand_2_thrifty()...; bpg: {bpg}, tpb: {tpb}]") dev_trees_actions_expanded_flat = cuda.to_device(trees_actions_expanded_flat) MCTSNC._expand_2_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, dev_trees_actions_expanded_flat) cuda.synchronize() t2_expand_2 = time.time() if self.verbose_debug: print(f"[MCTSNC._expand_2_thrifty() done; time: {t2_expand_2 - t1_expand_2} s]") t2_expand = time.time() self.time_expand += t2_expand - t1_expand # playouts t1_playout = time.time() bpg = self.n_trees tpb = self.n_playouts if self.verbose_debug: print(f"[MCTSNC._playout_ocp()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._playout_ocp[bpg, tpb](self.dev_trees, self.dev_trees_turns, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded, self.dev_random_generators_playout, self.dev_trees_playout_outcomes) cuda.synchronize() t2_playout = time.time() if self.verbose_debug: print(f"[MCTSNC._playout_ocp() done; time: {t2_playout - t1_playout} s]") self.time_playout += t2_playout - t1_playout # backups t1_backup = time.time() bpg = self.n_trees tpb = self.tpb_b2 if self.verbose_debug: print(f"[MCTSNC._backup_ocp()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._backup_ocp[bpg, tpb](self.n_playouts, self.dev_trees, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_nodes_selected, self.dev_trees_selected_paths, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes) cuda.synchronize() t2_backup = time.time() if self.verbose_debug: print(f"[MCTSNC._backup() done; time: {t2_backup - t1_backup} s]") self.time_backup += t2_backup - t1_backup self.steps += 1 self.time_loop = time.time() - t1_loop # sum reduction over trees for each root action t1_reduce_over_trees = time.time() root_actions_expanded = np.empty_like(self.dev_root_actions_expanded) self.dev_root_actions_expanded.copy_to_host(ary=root_actions_expanded) n_root_actions = int(root_actions_expanded[-1]) bpg = n_root_actions tpb = self.tpb_rot if self.verbose_debug: print(f"[MCTSNC._reduce_over_trees_thrifty()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reduce_over_trees_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_root_actions_expanded, root_turn, self.dev_root_ns, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins) cuda.synchronize() t2_reduce_over_trees = time.time() self.time_reduce_over_trees = t2_reduce_over_trees - t1_reduce_over_trees if self.verbose_debug: print(f"[MCTSNC._reduce_over_trees_thrifty() done; time: {self.time_reduce_over_trees} s]") # max-argmax reduction over root actions t1_reduce_over_actions = time.time() bpg = 1 tpb = self.tpb_roa if self.verbose_debug: print(f"[MCTSNC._reduce_over_actions_thrifty()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reduce_over_actions_thrifty[bpg, tpb](n_root_actions, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins, self.dev_best_action, self.dev_best_win_flag, self.dev_best_n, self.dev_best_n_wins) self.best_action = self.dev_best_action.copy_to_host()[0] self.best_win_flag = self.dev_best_win_flag.copy_to_host()[0] self.best_n = self.dev_best_n.copy_to_host()[0] self.best_n_wins = self.dev_best_n_wins.copy_to_host()[0] self.best_q = self.best_n_wins / self.best_n if self.best_n > 0 else np.nan cuda.synchronize() self.best_action = root_actions_expanded[self.best_action] t2_reduce_over_actions = time.time() self.time_reduce_over_actions = t2_reduce_over_actions - t1_reduce_over_actions if self.verbose_debug: print(f"[MCTSNC._reduce_over_actions_thrifty() done; time: {self.time_reduce_over_actions} s]") t2 = time.time() self.time_total = t2 - t1 if self.verbose_info: print(f"[actions info:\n{dict_to_str(self._make_actions_info_thrifty())}]") print(f"[performance info:\n{dict_to_str(self._make_performance_info())}]")
[docs] def _run_ocp_prodigal(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf): """Runs computations for algorithmic variant: ``"ocp_prodigal"``.""" t1 = time.time() # reset t1_reset = time.time() bpg = self.n_trees tpb = self.tpb_r dev_root_board = cuda.to_device(root_board) if root_extra_info is None: root_extra_info = np.zeros(1, dtype=np.int8) # fake extra info array dev_root_extra_info = cuda.to_device(root_extra_info) if self.verbose_debug: print(f"[MCTSNC._reset()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reset[bpg, tpb](dev_root_board, dev_root_extra_info, root_turn, self.dev_trees, self.dev_trees_sizes, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_boards, self.dev_trees_extra_infos) cuda.synchronize() t2_reset = time.time() if self.verbose_debug: print(f"[MCTSNC._reset() done; time: {t2_reset - t1_reset} s]") self.time_select = 0.0 self.time_expand = 0.0 self.time_playout = 0.0 self.time_backup = 0.0 self.steps = 0 t1_loop = time.time() while True: t2_loop = time.time() if forced_search_steps_limit < np.inf: if self.steps >= forced_search_steps_limit: break elif self.steps >= self.search_steps_limit or t2_loop - t1_loop >= self.search_time_limit: break if self.verbose_debug: print(f"[step: {self.steps + 1} starting, time used so far: {t2_loop - t1_loop} s]") # selections t1_select = time.time() bpg = self.n_trees tpb = self.tpb_s if self.verbose_debug: print(f"[MCTSNC._select()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._select[bpg, tpb](self.ucb_c, self.dev_trees, self.dev_trees_leaves, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_nodes_selected, self.dev_trees_selected_paths) cuda.synchronize() t2_select = time.time() if self.verbose_debug: print(f"[MCTSNC._select() done; time: {t2_select - t1_select} s]") self.time_select += t2_select - t1_select # expansions t1_expand = time.time() t1_expand_1 = time.time() bpg = self.n_trees tpb = self.tpb_e1 if self.verbose_debug: print(f"[MCTSNC._expand_1_ocp_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._expand_1_ocp_prodigal[bpg, tpb](self.max_tree_size, self.dev_trees, self.dev_trees_sizes, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_random_generators_expand_1, self.dev_trees_actions_expanded) cuda.synchronize() if self.steps == 0: MCTSNC._memorize_root_actions_expanded[1, self.state_max_actions + 2](self.dev_trees_actions_expanded, self.dev_root_actions_expanded) cuda.synchronize() t2_expand_1 = time.time() if self.verbose_debug: print(f"[MCTSNC._expand_1_ocp_progial() done; time: {t2_expand_1 - t1_expand_1} s]") t1_expand_2 = time.time() bpg = (self.n_trees, self.state_max_actions) # prodigal number of blocks tpb = self.tpb_e2 if self.verbose_debug: print(f"[MCTSNC._expand_2_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._expand_2_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded) cuda.synchronize() t2_expand_2 = time.time() if self.verbose_debug: print(f"[MCTSNC._expand_2_prodigal() done; time: {t2_expand_2 - t1_expand_2} s]") t2_expand = time.time() self.time_expand += t2_expand - t1_expand # playouts t1_playout = time.time() bpg = self.n_trees tpb = self.n_playouts if self.verbose_debug: print(f"[MCTSNC._playout_ocp()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._playout_ocp[bpg, tpb](self.dev_trees, self.dev_trees_turns, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded, self.dev_random_generators_playout, self.dev_trees_playout_outcomes) cuda.synchronize() t2_playout = time.time() if self.verbose_debug: print(f"[MCTSNC._playout_ocp() done; time: {t2_playout - t1_playout} s]") self.time_playout += t2_playout - t1_playout # backups t1_backup = time.time() bpg = self.n_trees tpb = self.tpb_b2 if self.verbose_debug: print(f"[MCTSNC._backup_ocp()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._backup_ocp[bpg, tpb](self.n_playouts, self.dev_trees, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_nodes_selected, self.dev_trees_selected_paths, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes) cuda.synchronize() t2_backup = time.time() if self.verbose_debug: print(f"[MCTSNC._backup() done; time: {t2_backup - t1_backup} s]") self.time_backup += t2_backup - t1_backup self.steps += 1 self.time_loop = time.time() - t1_loop # sum reduction over trees for each root action t1_reduce_over_trees = time.time() bpg = self.state_max_actions tpb = self.tpb_rot if self.verbose_debug: print(f"[MCTSNC._reduce_over_trees_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reduce_over_trees_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_root_actions_expanded, root_turn, self.dev_root_ns, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins) cuda.synchronize() t2_reduce_over_trees = time.time() self.time_reduce_over_trees = t2_reduce_over_trees - t1_reduce_over_trees if self.verbose_debug: print(f"[MCTSNC._reduce_over_trees_prodigal() done; time: {self.time_reduce_over_trees} s]") # max-argmax reduction over root actions t1_reduce_over_actions = time.time() bpg = 1 tpb = self.tpb_roa if self.verbose_debug: print(f"[MCTSNC._reduce_over_actions_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reduce_over_actions_prodigal[bpg, tpb](self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins, self.dev_best_action, self.dev_best_win_flag, self.dev_best_n, self.dev_best_n_wins) self.best_action = self.dev_best_action.copy_to_host()[0] self.best_win_flag = self.dev_best_win_flag.copy_to_host()[0] self.best_n = self.dev_best_n.copy_to_host()[0] self.best_n_wins = self.dev_best_n_wins.copy_to_host()[0] self.best_q = self.best_n_wins / self.best_n if self.best_n > 0 else np.nan cuda.synchronize() t2_reduce_over_actions = time.time() self.time_reduce_over_actions = t2_reduce_over_actions - t1_reduce_over_actions if self.verbose_debug: print(f"[MCTSNC._reduce_over_actions_prodigal() done; time: {self.time_reduce_over_actions} s]") t2 = time.time() self.time_total = t2 - t1 if self.verbose_info: print(f"[actions info:\n{dict_to_str(self._make_actions_info_prodigal())}]") print(f"[performance info:\n{dict_to_str(self._make_performance_info())}]")
[docs] def _run_acp_thrifty(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf): """Runs computations for algorithmic variant: ``"acp_thrifty"``.""" t1 = time.time() # reset t1_reset = time.time() bpg = self.n_trees tpb = self.tpb_r dev_root_board = cuda.to_device(root_board) if root_extra_info is None: root_extra_info = np.zeros(1, dtype=np.int8) # fake extra info array dev_root_extra_info = cuda.to_device(root_extra_info) if self.verbose_debug: print(f"[MCTSNC._reset()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reset[bpg, tpb](dev_root_board, dev_root_extra_info, root_turn, self.dev_trees, self.dev_trees_sizes, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_boards, self.dev_trees_extra_infos) cuda.synchronize() t2_reset = time.time() if self.verbose_debug: print(f"[MCTSNC._reset() done; time: {t2_reset - t1_reset} s]") self.time_select = 0.0 self.time_expand = 0.0 self.time_playout = 0.0 self.time_backup = 0.0 self.steps = 0 trees_actions_expanded = np.empty((self.n_trees, self.state_max_actions + 2), dtype=np.int16) t1_loop = time.time() while True: t2_loop = time.time() if forced_search_steps_limit < np.inf: if self.steps >= forced_search_steps_limit: break elif self.steps >= self.search_steps_limit or t2_loop - t1_loop >= self.search_time_limit: break if self.verbose_debug: print(f"[step: {self.steps + 1} starting, time used so far: {t2_loop - t1_loop} s]") # selections t1_select = time.time() bpg = self.n_trees tpb = self.tpb_s if self.verbose_debug: print(f"[MCTSNC._select()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._select[bpg, tpb](self.ucb_c, self.dev_trees, self.dev_trees_leaves, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_nodes_selected, self.dev_trees_selected_paths) cuda.synchronize() t2_select = time.time() if self.verbose_debug: print(f"[MCTSNC._select() done; time: {t2_select - t1_select} s]") self.time_select += t2_select - t1_select # expansions t1_expand = time.time() t1_expand_1 = time.time() bpg = self.n_trees tpb = self.tpb_e1 if self.verbose_debug: print(f"[MCTSNC._expand_1_acp_thrifty()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._expand_1_acp_thrifty[bpg, tpb](self.max_tree_size, self.dev_trees, self.dev_trees_sizes, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded) self.dev_trees_actions_expanded.copy_to_host(ary=trees_actions_expanded) cuda.synchronize() if self.steps == 0: MCTSNC._memorize_root_actions_expanded[1, self.state_max_actions + 2](self.dev_trees_actions_expanded, self.dev_root_actions_expanded) cuda.synchronize() t2_expand_1 = time.time() if self.verbose_debug: print(f"[MCTSNC._expand_1_acp_thrifty() done; time: {t2_expand_1 - t1_expand_1} s]") t1_expand_2 = time.time() trees_actions_expanded_flat = self._flatten_trees_actions_expanded_thrifty(trees_actions_expanded) bpg = trees_actions_expanded_flat.shape[0] # thrifty number of blocks tpb = self.tpb_e2 if self.verbose_debug: print(f"[MCTSNC._expand_2_thrifty()...; bpg: {bpg}, tpb: {tpb}]") dev_trees_actions_expanded_flat = cuda.to_device(trees_actions_expanded_flat) MCTSNC._expand_2_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, dev_trees_actions_expanded_flat) cuda.synchronize() t2_expand_2 = time.time() if self.verbose_debug: print(f"[MCTSNC._expand_2_thrifty() done; time: {t2_expand_2 - t1_expand_2} s]") t2_expand = time.time() self.time_expand += t2_expand - t1_expand # playouts t1_playout = time.time() bpg = trees_actions_expanded_flat.shape[0] # thrifty number of blocks tpb = self.n_playouts if self.verbose_debug: print(f"[MCTSNC._playout_acp_thrifty()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._playout_acp_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_turns, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded, dev_trees_actions_expanded_flat, self.dev_random_generators_playout, self.dev_trees_playout_outcomes, self.dev_trees_playout_outcomes_children) cuda.synchronize() t2_playout = time.time() if self.verbose_debug: print(f"[MCTSNC._playout_acp_thrifty() done; time: {t2_playout - t1_playout} s]") self.time_playout += t2_playout - t1_playout # backups t1_backup = time.time() t1_backup_1 = time.time() bpg = self.n_trees tpb = self.tpb_b1 if self.verbose_debug: print(f"[MCTSNC._backup_acp_thrifty()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._backup_1_acp_thrifty[bpg, tpb](self.n_playouts, self.dev_trees, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes, self.dev_trees_playout_outcomes_children) cuda.synchronize() t2_backup_1 = time.time() if self.verbose_debug: print(f"[MCTSNC._backup_1_acp_thrifty() done; time: {t2_backup_1 - t1_backup_1} s]") t1_backup_2 = time.time() bpg = self.n_trees tpb = self.tpb_b2 if self.verbose_debug: print(f"[MCTSNC._backup_2_acp()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._backup_2_acp[bpg, tpb](self.n_playouts, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_selected_paths, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes) cuda.synchronize() t2_backup_2 = time.time() if self.verbose_debug: print(f"[MCTSNC._backup_2_acp() done; time: {t2_backup_2 - t1_backup_2} s]") t2_backup = time.time() self.time_backup += t2_backup - t1_backup self.steps += 1 self.time_loop = time.time() - t1_loop # sum reduction over trees for each root action t1_reduce_over_trees = time.time() root_actions_expanded = np.empty_like(self.dev_root_actions_expanded) self.dev_root_actions_expanded.copy_to_host(ary=root_actions_expanded) n_root_actions = int(root_actions_expanded[-1]) bpg = n_root_actions tpb = self.tpb_rot if self.verbose_debug: print(f"[MCTSNC._reduce_over_trees_thrifty()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reduce_over_trees_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_root_actions_expanded, root_turn, self.dev_root_ns, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins) cuda.synchronize() t2_reduce_over_trees = time.time() self.time_reduce_over_trees = t2_reduce_over_trees - t1_reduce_over_trees if self.verbose_debug: print(f"[MCTSNC._reduce_over_trees_thrifty() done; time: {self.time_reduce_over_trees} s]") # max-argmax reduction over root actions t1_reduce_over_actions = time.time() bpg = 1 tpb = self.tpb_roa if self.verbose_debug: print(f"[MCTSNC._reduce_over_actions_thrifty()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reduce_over_actions_thrifty[bpg, tpb](n_root_actions, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins, self.dev_best_action, self.dev_best_win_flag, self.dev_best_n, self.dev_best_n_wins) self.best_action = self.dev_best_action.copy_to_host()[0] self.best_win_flag = self.dev_best_win_flag.copy_to_host()[0] self.best_n = self.dev_best_n.copy_to_host()[0] self.best_n_wins = self.dev_best_n_wins.copy_to_host()[0] self.best_q = self.best_n_wins / self.best_n if self.best_n > 0 else np.nan cuda.synchronize() self.best_action = root_actions_expanded[self.best_action] t2_reduce_over_actions = time.time() self.time_reduce_over_actions = t2_reduce_over_actions - t1_reduce_over_actions if self.verbose_debug: print(f"[MCTSNC._reduce_over_actions_thrifty() done; time: {self.time_reduce_over_actions} s]") t2 = time.time() self.time_total = t2 - t1 if self.verbose_info: print(f"[actions info:\n{dict_to_str(self._make_actions_info_thrifty())}]") print(f"[performance info:\n{dict_to_str(self._make_performance_info())}]")
[docs] def _run_acp_prodigal(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf): """Runs computations for algorithmic variant: ``"acp_prodigal"``.""" t1 = time.time() # reset t1_reset = time.time() bpg = self.n_trees tpb = self.tpb_r dev_root_board = cuda.to_device(root_board) if root_extra_info is None: root_extra_info = np.zeros(1, dtype=np.int8) # fake extra info array dev_root_extra_info = cuda.to_device(root_extra_info) if self.verbose_debug: print(f"[MCTSNC._reset()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reset[bpg, tpb](dev_root_board, dev_root_extra_info, root_turn, self.dev_trees, self.dev_trees_sizes, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_boards, self.dev_trees_extra_infos) cuda.synchronize() t2_reset = time.time() if self.verbose_debug: print(f"[MCTSNC._reset() done; time: {t2_reset - t1_reset} s]") self.time_select = 0.0 self.time_expand = 0.0 self.time_playout = 0.0 self.time_backup = 0.0 self.steps = 0 t1_loop = time.time() while True: t2_loop = time.time() if forced_search_steps_limit < np.inf: if self.steps >= forced_search_steps_limit: break elif self.steps >= self.search_steps_limit or t2_loop - t1_loop >= self.search_time_limit: break if self.verbose_debug: print(f"[step: {self.steps + 1} starting, time used so far: {t2_loop - t1_loop} s]") # selections t1_select = time.time() bpg = self.n_trees tpb = self.tpb_s if self.verbose_debug: print(f"[MCTSNC._select()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._select[bpg, tpb](self.ucb_c, self.dev_trees, self.dev_trees_leaves, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_nodes_selected, self.dev_trees_selected_paths) cuda.synchronize() t2_select = time.time() if self.verbose_debug: print(f"[MCTSNC._select() done; time: {t2_select - t1_select} s]") self.time_select += t2_select - t1_select # expansions t1_expand = time.time() t1_expand_1 = time.time() bpg = self.n_trees tpb = self.tpb_e1 if self.verbose_debug: print(f"[MCTSNC._expand_1_acp_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._expand_1_acp_prodigal[bpg, tpb](self.max_tree_size, self.dev_trees, self.dev_trees_sizes, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded) cuda.synchronize() if self.steps == 0: MCTSNC._memorize_root_actions_expanded[1, self.state_max_actions + 2](self.dev_trees_actions_expanded, self.dev_root_actions_expanded) cuda.synchronize() t2_expand_1 = time.time() if self.verbose_debug: print(f"[MCTSNC._expand_1_acp_prodigal() done; time: {t2_expand_1 - t1_expand_1} s]") t1_expand_2 = time.time() bpg = (self.n_trees, self.state_max_actions) # prodigal number of blocks tpb = self.tpb_e2 if self.verbose_debug: print(f"[MCTSNC._expand_2_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._expand_2_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded) cuda.synchronize() t2_expand_2 = time.time() if self.verbose_debug: print(f"[MCTSNC._expand_2_prodigal() done; time: {t2_expand_2 - t1_expand_2} s]") t2_expand = time.time() self.time_expand += t2_expand - t1_expand # playouts t1_playout = time.time() bpg = (self.n_trees, self.state_max_actions) # prodigal number of blocks tpb = self.n_playouts if self.verbose_debug: print(f"[MCTSNC._playout_acp_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._playout_acp_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_turns, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_boards, self.dev_trees_extra_infos, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded, self.dev_random_generators_playout, self.dev_trees_playout_outcomes, self.dev_trees_playout_outcomes_children) cuda.synchronize() t2_playout = time.time() if self.verbose_debug: print(f"[MCTSNC._playout_acp_prodigal() done; time: {t2_playout - t1_playout} s]") self.time_playout += t2_playout - t1_playout # backups t1_backup = time.time() t1_backup_1 = time.time() bpg = self.n_trees tpb = self.tpb_b1 if self.verbose_debug: print(f"[MCTSNC._backup_1_acp_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._backup_1_acp_prodigal[bpg, tpb](self.n_playouts, self.dev_trees, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_nodes_selected, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes, self.dev_trees_playout_outcomes_children) cuda.synchronize() t2_backup_1 = time.time() if self.verbose_debug: print(f"[MCTSNC._backup_1_acp_prodigal() done; time: {t2_backup_1 - t1_backup_1} s]") t1_backup_2 = time.time() bpg = self.n_trees tpb = self.tpb_b2 if self.verbose_debug: print(f"[MCTSNC._backup_2_acp()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._backup_2_acp[bpg, tpb](self.n_playouts, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_trees_selected_paths, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes) cuda.synchronize() t2_backup_2 = time.time() if self.verbose_debug: print(f"[MCTSNC._backup_2_acp() done; time: {t2_backup_2 - t1_backup_2} s]") t2_backup = time.time() self.time_backup += t2_backup - t1_backup self.steps += 1 self.time_loop = time.time() - t1_loop # sum reduction over trees t1_reduce_over_trees = time.time() bpg = self.state_max_actions tpb = self.tpb_rot if self.verbose_debug: print(f"[MCTSNC._reduce_over_trees_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reduce_over_trees_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins, self.dev_root_actions_expanded, root_turn, self.dev_root_ns, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins) cuda.synchronize() t2_reduce_over_trees = time.time() self.time_reduce_over_trees = t2_reduce_over_trees - t1_reduce_over_trees if self.verbose_debug: print(f"[MCTSNC._reduce_over_trees_prodigal() done; time: {self.time_reduce_over_trees} s]") # max-argmax reduction over root actions t1_reduce_over_actions = time.time() bpg = 1 tpb = self.tpb_roa if self.verbose_debug: print(f"[MCTSNC._reduce_over_actions_prodigal()...; bpg: {bpg}, tpb: {tpb}]") MCTSNC._reduce_over_actions_prodigal[bpg, tpb](self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins, self.dev_best_action, self.dev_best_win_flag, self.dev_best_n, self.dev_best_n_wins) self.best_action = self.dev_best_action.copy_to_host()[0] self.best_win_flag = self.dev_best_win_flag.copy_to_host()[0] self.best_n = self.dev_best_n.copy_to_host()[0] self.best_n_wins = self.dev_best_n_wins.copy_to_host()[0] self.best_q = self.best_n_wins / self.best_n if self.best_n > 0 else np.nan cuda.synchronize() t2_reduce_over_actions = time.time() self.time_reduce_over_actions = t2_reduce_over_actions - t1_reduce_over_actions if self.verbose_debug: print(f"[MCTSNC._reduce_over_actions_prodigal() done; time: {self.time_reduce_over_actions} s]") t2 = time.time() self.time_total = t2 - t1 if self.verbose_info: print(f"[actions info:\n{dict_to_str(self._make_actions_info_prodigal())}]") print(f"[performance info:\n{dict_to_str(self._make_performance_info())}]")
[docs] @staticmethod @cuda.jit(void(int8[:, :], int8[:], int8, int32[:, :, :], int32[:], int16[:, :], int8[:, :], boolean[:, :], boolean[:, :], int32[:, :], int32[:, :], int8[:, :, :, :], int8[:, :, :])) def _reset(root_board, root_extra_info, root_turn, trees, trees_sizes, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos): """CUDA kernel responsible for reseting root nodes of trees to new root state.""" ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x if t == 0: trees[ti, 0, 0] = int32(-1) trees_sizes[ti] = int32(1) trees_depths[ti, 0] = int32(0) trees_turns[ti, 0] = int8(root_turn) trees_leaves[ti, 0] = True trees_terminals[ti, 0] = False trees_ns[ti, 0] = int32(0) trees_ns_wins[ti, 0] = int32(0) m, n = root_board.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index for _ in range(bept): if e < m_n: i = e // n j = e % n trees_boards[ti, 0, i, j] = root_board[i, j] e += tpb extra_info_memory = root_extra_info.size eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: trees_extra_infos[ti, 0, e] = root_extra_info[e]
[docs] @staticmethod @cuda.jit(void(float32, int32[:, :, :], boolean[:, :], int32[:, :], int32[:, :], int32[:], int32[:, :])) def _select(ucb_c, trees, trees_leaves, trees_ns, trees_ns_wins, trees_nodes_selected, trees_selected_paths): """CUDA kernel responsible for computations of stage: selections.""" shared_ucbs = cuda.shared.array(512, dtype=float32) # 512 - assumed limit on max actions shared_best_child = cuda.shared.array(512, dtype=int32) # 512 - assumed limit on max actions (array instead of one index due to max-argmax reduction pattern) shared_selected_path = cuda.shared.array(2048 + 2, dtype=int32) # 2048 - assumed equal to MAX_TREE_DEPTH ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x state_max_actions = int16(trees.shape[2] - 1) node = int32(0) depth = int16(0) if t == 0: shared_selected_path[0] = int32(0) # path always starting from root while not trees_leaves[ti, node]: if t < state_max_actions: child = trees[ti, node, 1 + t] shared_best_child[t] = child if child == int32(-1): shared_ucbs[t] = -float32(inf) else: child_n = trees_ns[ti, child] if child_n == int32(0): shared_ucbs[t] = float32(inf) else: shared_ucbs[t] = trees_ns_wins[ti, child] / float32(child_n) + ucb_c * math.sqrt(math.log(trees_ns[ti, node]) / child_n) else: shared_ucbs[t] = -float32(inf) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # max-argmax reduction pattern if t < stride: t_stride = t + stride if shared_ucbs[t] < shared_ucbs[t_stride]: shared_ucbs[t] = shared_ucbs[t_stride] shared_best_child[t] = shared_best_child[t_stride] cuda.syncthreads() stride >>= 1 node = shared_best_child[0] depth += int16(1) if t == 0: shared_selected_path[depth] = node path_length = depth + 1 pept = (path_length + tpb - 1) // tpb # path elements per thread e = t for _ in range(pept): if e < path_length: trees_selected_paths[ti, e] = shared_selected_path[e] e += tpb if t == 0: trees_nodes_selected[ti] = node trees_selected_paths[ti, -1] = path_length
[docs] @staticmethod @cuda.jit(void(int32, int32[:, :, :], int32[:], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], xoroshiro128p_type[:], int16[:, :])) def _expand_1_ocp_thrifty(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, random_generators_expand_1, trees_actions_expanded): """CUDA kernel responsible for computations of stage: expansions (substage 1, variant ``"ocp_thrifty"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info shared_legal_actions = cuda.shared.array(512, dtype=boolean) # 512 - assumed limit on max actions shared_legal_actions_child_shifts = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x t_global = cuda.grid(1) state_max_actions = int16(trees.shape[2] - 1) _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index selected = trees_nodes_selected[ti] # node selected for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, selected, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, selected, e] e += tpb cuda.syncthreads() selected_is_terminal = trees_terminals[ti, selected] if selected_is_terminal: shared_legal_actions[t] = False elif t < state_max_actions: is_action_legal(m, n, shared_board, shared_extra_info, trees_turns[ti, selected], t, shared_legal_actions) cuda.syncthreads() size_so_far = trees_sizes[ti] child_shift = int16(-1) rand_child_for_playout = int16(-3) # remains like this when tree cannot grow due to memory exhausted if t < state_max_actions: shared_legal_actions_child_shifts[t] = int16(-1) if t == 0: if not selected_is_terminal: for i in range(state_max_actions): if shared_legal_actions[i] and size_so_far + child_shift + 1 < max_tree_size: child_shift += 1 shared_legal_actions_child_shifts[i] = child_shift if child_shift >= int16(0): trees_actions_expanded[ti, -1] = child_shift + 1 # information how many children expanded (as last entry) trees_leaves[ti, selected] = False rand_child_for_playout = int16(xoroshiro128p_uniform_float32(random_generators_expand_1, t_global) * (child_shift + 1)) else: trees_actions_expanded[ti, -1] = int16(1) # tree not grown due to memory exhausted, but selected shall be played out (hence 1 needed) trees_actions_expanded[ti, -2] = rand_child_for_playout else: trees_actions_expanded[ti, -1] = int16(1) # terminal in fact not expanded, but shall be played out (hence 1 needed) trees_actions_expanded[ti, -2] = int16(-1) # fake child for playouts indicating that selected is terminal (and playouts computed from outcome) cuda.syncthreads() if t < state_max_actions: child_index = int32(-1) if shared_legal_actions[t]: child_shift = shared_legal_actions_child_shifts[t] if child_shift >= int16(0): child_index = size_so_far + child_shift trees_actions_expanded[ti, child_shift] = t # for thrifty variants trees[ti, selected, 1 + t] = child_index # parent gets to know where child is if t == 0: trees_sizes[ti] += shared_legal_actions_child_shifts[state_max_actions - 1] + 1 # updating tree size
[docs] @staticmethod @cuda.jit(void(int32, int32[:, :, :], int32[:], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], xoroshiro128p_type[:], int16[:, :])) def _expand_1_ocp_prodigal(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, random_generators_expand_1, trees_actions_expanded): """CUDA kernel responsible for computations of stage: expansions (substage 1, variant ``"ocp_prodigal"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info shared_legal_actions = cuda.shared.array(512, dtype=boolean) # 512 - assumed limit on max actions shared_legal_actions_child_shifts = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions shared_map_child_shifts_to_action = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x t_global = cuda.grid(1) state_max_actions = int16(trees.shape[2] - 1) _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index selected = trees_nodes_selected[ti] # node selected for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, selected, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, selected, e] e += tpb cuda.syncthreads() selected_is_terminal = trees_terminals[ti, selected] if selected_is_terminal: shared_legal_actions[t] = False elif t < state_max_actions: is_action_legal(m, n, shared_board, shared_extra_info, trees_turns[ti, selected], t, shared_legal_actions) cuda.syncthreads() size_so_far = trees_sizes[ti] child_shift = int16(-1) rand_child_for_playout = int16(-3) # remains like this when tree cannot grow due to memory exhausted if t < state_max_actions: shared_legal_actions_child_shifts[t] = int16(-1) if t == 0: if not selected_is_terminal: for i in range(state_max_actions): if shared_legal_actions[i] and size_so_far + child_shift + 1 < max_tree_size: child_shift += 1 shared_map_child_shifts_to_action[child_shift] = i shared_legal_actions_child_shifts[i] = child_shift if child_shift >= int16(0): trees_actions_expanded[ti, -1] = child_shift + 1 # information how many children expanded (as last entry) trees_leaves[ti, selected] = False rand_child_for_playout = int16(xoroshiro128p_uniform_float32(random_generators_expand_1, t_global) * (child_shift + 1)) rand_child_for_playout = shared_map_child_shifts_to_action[rand_child_for_playout] else: trees_actions_expanded[ti, -1] = int16(1) # tree not grown due to memory exhausted, but selected shall be played out (hence 1 needed) trees_actions_expanded[ti, -2] = rand_child_for_playout else: trees_actions_expanded[ti, -1] = int16(1) trees_actions_expanded[ti, -2] = int16(-1) # fake child for playouts indicating that selected is terminal (and playouts computed from outcome) cuda.syncthreads() if t < state_max_actions: child_index = int32(-1) if shared_legal_actions[t]: child_shift = shared_legal_actions_child_shifts[t] if child_shift >= int16(0): child_index = size_so_far + child_shift trees_actions_expanded[ti, t] = t # for prodigal variants else: trees_actions_expanded[ti, t] = int16(-1) # for prodigal variants trees[ti, selected, 1 + t] = child_index # parent gets to know where child is if t == 0: trees_sizes[ti] += shared_legal_actions_child_shifts[state_max_actions - 1] + 1 # updating tree size
[docs] @staticmethod @cuda.jit(void(int32, int32[:, :, :], int32[:], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :])) def _expand_1_acp_thrifty(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded): """CUDA kernel responsible for computations of stage: expansions (substage 1, variant ``"acp_thrifty"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info shared_legal_actions = cuda.shared.array(512, dtype=boolean) # 512 - assumed limit on max actions shared_legal_actions_child_shifts = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x state_max_actions = int16(trees.shape[2] - 1) _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index selected = trees_nodes_selected[ti] # node selected for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, selected, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, selected, e] e += tpb cuda.syncthreads() selected_is_terminal = trees_terminals[ti, selected] if selected_is_terminal: shared_legal_actions[t] = False elif t < state_max_actions: is_action_legal(m, n, shared_board, shared_extra_info, trees_turns[ti, selected], t, shared_legal_actions) cuda.syncthreads() size_so_far = trees_sizes[ti] child_shift = int16(-1) fake_child_for_playout = int16(-3) # remains like this when tree cannot grow due to memory exhausted if t < state_max_actions: shared_legal_actions_child_shifts[t] = int16(-1) if t == 0: if not selected_is_terminal: for i in range(state_max_actions): if shared_legal_actions[i] and size_so_far + child_shift + 1 < max_tree_size: child_shift += 1 shared_legal_actions_child_shifts[i] = child_shift if child_shift >= int16(0): trees_actions_expanded[ti, -1] = child_shift + 1 # information how many children expanded (as last entry) trees_leaves[ti, selected] = False fake_child_for_playout = int16(-2) # indicates all children for playouts (acp) else: trees_actions_expanded[ti, -1] = int16(1) # tree not grown due to memory exhausted, but selected shall be played out (hence 1 needed) trees_actions_expanded[ti, -2] = fake_child_for_playout else: trees_actions_expanded[ti, -1] = int16(1) trees_actions_expanded[ti, -2] = int16(-1) # fake child for playouts indicating that selected is terminal (and playouts computed from outcome) cuda.syncthreads() if t < state_max_actions: child_index = int32(-1) if shared_legal_actions[t]: child_shift = shared_legal_actions_child_shifts[t] if child_shift >= int16(0): child_index = size_so_far + child_shift trees_actions_expanded[ti, child_shift] = t # for thrifty variants trees[ti, selected, 1 + t] = child_index # parent gets to know where child is if t == 0: trees_sizes[ti] += shared_legal_actions_child_shifts[state_max_actions - 1] + 1 # updating tree size if selected_is_terminal or fake_child_for_playout == int16(-3): trees_actions_expanded[ti, 0] = int16(0) # fake legal action for playout (so that exactly one block becomes executed in full body)
[docs] @staticmethod @cuda.jit(void(int32, int32[:, :, :], int32[:], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :])) def _expand_1_acp_prodigal(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded): """CUDA kernel responsible for computations of stage: expansions (substage 1, variant ``"acp_prodigal"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info shared_legal_actions = cuda.shared.array(512, dtype=boolean) # 512 - assumed limit on max actions shared_legal_actions_child_shifts = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x state_max_actions = int16(trees.shape[2] - 1) _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index selected = trees_nodes_selected[ti] # node selected for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, selected, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, selected, e] e += tpb cuda.syncthreads() selected_is_terminal = trees_terminals[ti, selected] if selected_is_terminal: shared_legal_actions[t] = False elif t < state_max_actions: is_action_legal(m, n, shared_board, shared_extra_info, trees_turns[ti, selected], t, shared_legal_actions) cuda.syncthreads() size_so_far = trees_sizes[ti] child_shift = int16(-1) fake_child_for_playout = int16(-3) # remains like this when tree cannot grow due to memory exhausted if t < state_max_actions: shared_legal_actions_child_shifts[t] = int16(-1) if t == 0: if not selected_is_terminal: for i in range(state_max_actions): if shared_legal_actions[i] and size_so_far + child_shift + 1 < max_tree_size: child_shift += 1 shared_legal_actions_child_shifts[i] = child_shift if child_shift >= int16(0): trees_actions_expanded[ti, -1] = child_shift + 1 # information how many children expanded (as last entry) trees_leaves[ti, selected] = False fake_child_for_playout = int16(-2) # indicates all children for playouts (acp) else: trees_actions_expanded[ti, -1] = int16(1) # tree not grown due to memory exhausted, but selected shall be played out (hence 1 needed) trees_actions_expanded[ti, -2] = fake_child_for_playout else: trees_actions_expanded[ti, -1] = int16(1) trees_actions_expanded[ti, -2] = int16(-1) # fake child for playouts indicating that selected is terminal (and playouts computed from outcome) cuda.syncthreads() if t < state_max_actions: child_index = int32(-1) if shared_legal_actions[t]: child_shift = shared_legal_actions_child_shifts[t] if child_shift >= int16(0): child_index = size_so_far + child_shift trees_actions_expanded[ti, t] = t # for prodigal variants else: trees_actions_expanded[ti, t] = int16(-1) # tree not grown case else: trees_actions_expanded[ti, t] = int16(-1) # for prodigal variants trees[ti, selected, 1 + t] = child_index # parent gets to know where child is if t == 0: trees_sizes[ti] += shared_legal_actions_child_shifts[state_max_actions - 1] + 1 # updating tree size if selected_is_terminal or fake_child_for_playout == int16(-3): trees_actions_expanded[ti, 0] = int16(0) # fake legal action for playout (so that exactly one block becomes executed in full body)
[docs] @staticmethod @cuda.jit(void(int16[:, :], int16[:])) def _memorize_root_actions_expanded(dev_trees_actions_expanded, dev_root_actions_expanded): """CUDA kernel responsible for memorizing actions expanded at root node(s).""" t = cuda.threadIdx.x dev_root_actions_expanded[t] = dev_trees_actions_expanded[0, t]
[docs] @staticmethod @cuda.jit(void(int32[:, :, :], int16[:, :], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :], int32[:, :], int32[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :])) def _expand_2_thrifty(trees, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded_flat): """CUDA kernel responsible for computations of stage: expansions (substage 2, thrifty number of blocks - variant ``"ocp_thrifty"`` or ``"acp_thrifty"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info tai = cuda.blockIdx.x # tree-action pair index ti = trees_actions_expanded_flat[tai, 0] action = trees_actions_expanded_flat[tai, 1] if action < int16(0): return # selected is terminal or tree not grown due to memory exhausted tpb = cuda.blockDim.x t = cuda.threadIdx.x _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index selected = trees_nodes_selected[ti] if trees_terminals[ti, selected]: return for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, selected, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, selected, e] e += tpb cuda.syncthreads() turn = 0 if t == 0: turn = trees_turns[ti, selected] take_action(m, n, shared_board, shared_extra_info, turn, action) cuda.syncthreads() child = trees[ti, selected, 1 + action] e = t for _ in range(bept): if e < m_n: i = e // n j = e % n trees_boards[ti, child, i, j] = shared_board[i, j] e += tpb e = t for _ in range(eipt): if e < extra_info_memory: trees_extra_infos[ti, child, e] = shared_extra_info[e] e += tpb if t == 0: trees[ti, child, 0] = selected trees_turns[ti, child] = -turn trees_leaves[ti, child] = True terminal_flag = False outcome = compute_outcome(m, n, shared_board, shared_extra_info, -turn, action) if outcome == int8(-1) or outcome == int8(0) or outcome == int8(1): terminal_flag = True trees_terminals[ti, child] = terminal_flag trees_outcomes[ti, child] = outcome trees_ns[ti, child] = int32(0) trees_ns_wins[ti, child] = int32(0) trees_depths[ti, child] = trees_depths[ti, selected] + 1
[docs] @staticmethod @cuda.jit(void(int32[:, :, :], int16[:, :], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :], int32[:, :], int32[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :])) def _expand_2_prodigal(trees, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded): """CUDA kernel responsible for computations of stage: expansions (substage 2, prodigal number of blocks - variant ``"ocp_prodigal"`` or ``"acp_prodigal"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info ti = cuda.blockIdx.x action = cuda.blockIdx.y if trees_actions_expanded[ti, action] < int16(0): return # prodigality if trees_actions_expanded[ti, -2] == int16(-1) or trees_actions_expanded[ti, -2] == int16(-3): return # selected is terminal or tree cannot grow due to memory exhausted tpb = cuda.blockDim.x t = cuda.threadIdx.x _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index selected = trees_nodes_selected[ti] if trees_terminals[ti, selected]: return for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, selected, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, selected, e] e += tpb cuda.syncthreads() turn = int8(0) if t == 0: turn = trees_turns[ti, selected] take_action(m, n, shared_board, shared_extra_info, turn, action) cuda.syncthreads() child = trees[ti, selected, 1 + action] e = t for _ in range(bept): if e < m_n: i = e // n j = e % n trees_boards[ti, child, i, j] = shared_board[i, j] e += tpb e = t for _ in range(eipt): if e < extra_info_memory: trees_extra_infos[ti, child, e] = shared_extra_info[e] e += tpb if t == 0: trees[ti, child, 0] = selected trees_turns[ti, child] = -turn trees_leaves[ti, child] = True terminal_flag = False outcome = compute_outcome(m, n, shared_board, shared_extra_info, -turn, action) if outcome == int8(-1) or outcome == int8(0) or outcome == int8(1): terminal_flag = True trees_terminals[ti, child] = terminal_flag trees_outcomes[ti, child] = outcome trees_ns[ti, child] = int32(0) trees_ns_wins[ti, child] = int32(0) trees_depths[ti, child] = trees_depths[ti, selected] + 1
[docs] @staticmethod @cuda.jit(void(int32[:, :, :], int8[:, :], boolean[:, :], int8[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :], xoroshiro128p_type[:], int32[:, :])) def _playout_ocp(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, random_generators_playout, trees_playout_outcomes): """CUDA kernel responsible for computations of stage: playouts (variant ``"ocp_thrifty"`` or ``"ocp_prodigal"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info shared_playout_outcomes = cuda.shared.array((512, 2), dtype=int16) # 512 - assumed max tpb for playouts, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout local_board = cuda.local.array((32, 32), dtype=int8) local_extra_info = cuda.local.array(4096, dtype=int8) local_legal_actions_with_count = cuda.local.array(512 + 1, dtype=int16) # 512 - assumed limit on max actions ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x to_be_played_out = trees_nodes_selected[ti] # temporarily to_be_played_out equals selected rand_child_for_playout = trees_actions_expanded[ti, -2] last_action = int16(-1) # none yet if rand_child_for_playout >= int16(0): # check if some child picked on random for playouts last_action = trees_actions_expanded[ti, rand_child_for_playout] to_be_played_out = trees[ti, to_be_played_out, 1 + last_action] if trees_terminals[ti, to_be_played_out]: # root for playouts has been discovered terminal before (by game rules) -> taking stored outcome ("multiplied" by tpb) if t == 0: outcome = trees_outcomes[ti, to_be_played_out] trees_playout_outcomes[ti, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1 trees_playout_outcomes[ti, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1 else: t = cuda.threadIdx.x t_global = cuda.grid(1) shared_playout_outcomes[t, 0] = np.int16(0) shared_playout_outcomes[t, 1] = np.int16(0) _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, to_be_played_out, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, to_be_played_out, e] e += tpb cuda.syncthreads() for i in range(m): for j in range(n): local_board[i, j] = shared_board[i, j] for i in range(extra_info_memory): local_extra_info[i] = shared_extra_info[i] local_legal_actions_with_count[-1] = 0 turn = trees_turns[ti, to_be_played_out] outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) if last_action != int16(-1) else int8(2) # else case only when trees not grown due to memory limit (then selected played out) while True: # playout loop if not (outcome == int8(-1) or outcome == int8(0) or outcome == int8(1)): # indecisive, game ongoing legal_actions_playout(m, n, local_board, local_extra_info, turn, local_legal_actions_with_count) count = local_legal_actions_with_count[-1] action_ord = int16(xoroshiro128p_uniform_float32(random_generators_playout, t_global) * count) last_action = local_legal_actions_with_count[action_ord] take_action_playout(m, n, local_board, local_extra_info, turn, last_action, action_ord, local_legal_actions_with_count) turn = -turn else: if outcome != int8(0): shared_playout_outcomes[t, (outcome + 1) // 2] = int8(1) break outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # sum reduction pattern if t < stride: t_stride = t + stride shared_playout_outcomes[t, 0] += shared_playout_outcomes[t_stride, 0] shared_playout_outcomes[t, 1] += shared_playout_outcomes[t_stride, 1] cuda.syncthreads() stride >>= 1 if t == 0: trees_playout_outcomes[ti, 0] = shared_playout_outcomes[0, 0] trees_playout_outcomes[ti, 1] = shared_playout_outcomes[0, 1]
[docs] @staticmethod @cuda.jit(void(int32[:, :, :], int8[:, :], boolean[:, :], int8[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :], int16[:, :], xoroshiro128p_type[:], int32[:, :], int32[:, :, :])) def _playout_acp_thrifty(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, trees_actions_expanded_flat, random_generators_playout, trees_playout_outcomes, trees_playout_outcomes_children): """CUDA kernel responsible for computations of stage: playouts (variant ``"acp_thrifty"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info shared_playout_outcomes = cuda.shared.array((512, 2), dtype=int16) # 1024 - assumed max tpb for playouts, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout local_board = cuda.local.array((32, 32), dtype=int8) local_extra_info = cuda.local.array(4096, dtype=int8) local_legal_actions_with_count = cuda.local.array(512 + 1, dtype=int16) # 512 - assumed limit on max actions tai = cuda.blockIdx.x # tree-action pair index ti = trees_actions_expanded_flat[tai, 0] action = trees_actions_expanded_flat[tai, 1] tpb = cuda.blockDim.x t = cuda.threadIdx.x to_be_played_out = trees_nodes_selected[ti] # temporarily to_be_played_out equals selected fake_child_for_playout = trees_actions_expanded[ti, -2] last_action = int16(-1) # none yet if fake_child_for_playout == int16(-2): # check if playouts are to be made on all children of selected last_action = action to_be_played_out = trees[ti, to_be_played_out, 1 + last_action] if trees_terminals[ti, to_be_played_out]: # root for playouts has been discovered terminal before (by game rules) -> taking stored outcome ("multiplied" by tpb) if t == 0: outcome = trees_outcomes[ti, to_be_played_out] if fake_child_for_playout == int16(-2): # case where terminal is one child among all children of selected node trees_playout_outcomes_children[ti, action, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1 trees_playout_outcomes_children[ti, action, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1 else: # case where terminal was selected trees_playout_outcomes[ti, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1 trees_playout_outcomes[ti, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1 else: t = cuda.threadIdx.x state_max_actions = trees.shape[2] - 1 t_global = ti * state_max_actions * tpb + action * tpb + t # purposely (instead of t_global = cuda.grid(1)) to make resutls of acp_prodigal and acp_thrifty same (for equal number of steps) shared_playout_outcomes[t, 0] = np.int16(0) shared_playout_outcomes[t, 1] = np.int16(0) _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, to_be_played_out, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, to_be_played_out, e] e += tpb cuda.syncthreads() for i in range(m): for j in range(n): local_board[i, j] = shared_board[i, j] for i in range(extra_info_memory): local_extra_info[i] = shared_extra_info[i] local_legal_actions_with_count[-1] = 0 turn = trees_turns[ti, to_be_played_out] outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) if last_action != int16(-1) else int8(2) # else case only when trees not grown due to memory limit (then selected played out) while True: # playout loop if not (outcome == int8(-1) or outcome == int8(0) or outcome == int8(1)): # indecisive, game ongoing legal_actions_playout(m, n, local_board, local_extra_info, turn, local_legal_actions_with_count) count = local_legal_actions_with_count[-1] action_ord = int16(xoroshiro128p_uniform_float32(random_generators_playout, t_global) * count) last_action = local_legal_actions_with_count[action_ord] take_action_playout(m, n, local_board, local_extra_info, turn, last_action, action_ord, local_legal_actions_with_count) turn = -turn else: if outcome != int8(0): shared_playout_outcomes[t, (outcome + 1) // 2] = int8(1) break outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # sum reduction pattern if t < stride: t_stride = t + stride shared_playout_outcomes[t, 0] += shared_playout_outcomes[t_stride, 0] shared_playout_outcomes[t, 1] += shared_playout_outcomes[t_stride, 1] cuda.syncthreads() stride >>= 1 if t == 0: trees_playout_outcomes_children[ti, action, 0] = shared_playout_outcomes[0, 0] trees_playout_outcomes_children[ti, action, 1] = shared_playout_outcomes[0, 1]
[docs] @staticmethod @cuda.jit(void(int32[:, :, :], int8[:, :], boolean[:, :], int8[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :], xoroshiro128p_type[:], int32[:, :], int32[:, :, :])) def _playout_acp_prodigal(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, random_generators_playout, trees_playout_outcomes, trees_playout_outcomes_children): """CUDA kernel responsible for computations of stage: playouts (variant ``"acp_prodigal"``).""" shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block) shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info shared_playout_outcomes = cuda.shared.array((512, 2), dtype=int16) # 1024 - assumed max tpb for playouts, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout ti = cuda.blockIdx.x action = cuda.blockIdx.y if trees_actions_expanded[ti, action] < int16(0): # prodigality return local_board = cuda.local.array((32, 32), dtype=int8) local_extra_info = cuda.local.array(4096, dtype=int8) local_legal_actions_with_count = cuda.local.array(512 + 1, dtype=int16) # 512 - assumed limit on max actions tpb = cuda.blockDim.x t = cuda.threadIdx.x to_be_played_out = trees_nodes_selected[ti] # temporarily to_be_played_out equals selected fake_child_for_playout = trees_actions_expanded[ti, -2] last_action = int16(-1) # none yet if fake_child_for_playout == int16(-2): # check if playouts are to be made on all children of selected last_action = action to_be_played_out = trees[ti, to_be_played_out, 1 + last_action] # moving one level down from selected if trees_terminals[ti, to_be_played_out]: # root for playouts has been discovered terminal before (by game rules) -> taking stored outcome ("multiplied" by tpb) if t == 0: outcome = trees_outcomes[ti, to_be_played_out] if fake_child_for_playout == int16(-2): # case where terminal is one child among all children of selected node trees_playout_outcomes_children[ti, action, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1 trees_playout_outcomes_children[ti, action, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1 else: # case where terminal was selected trees_playout_outcomes[ti, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1 trees_playout_outcomes[ti, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1 else: # playouts for non-terminal t = cuda.threadIdx.x state_max_actions = trees.shape[2] - 1 t_global = ti * state_max_actions * tpb + action * tpb + t shared_playout_outcomes[t, 0] = np.int16(0) shared_playout_outcomes[t, 1] = np.int16(0) _, _, m, n = trees_boards.shape m_n = m * n bept = (m_n + tpb - 1) // tpb # board elements per thread e = t # board element flat index for _ in range(bept): if e < m_n: i = e // n j = e % n shared_board[i, j] = trees_boards[ti, to_be_played_out, i, j] e += tpb _, _, extra_info_memory = trees_extra_infos.shape eipt = (extra_info_memory + tpb - 1) // tpb e = t for _ in range(eipt): if e < extra_info_memory: shared_extra_info[e] = trees_extra_infos[ti, to_be_played_out, e] e += tpb cuda.syncthreads() for i in range(m): for j in range(n): local_board[i, j] = shared_board[i, j] for i in range(extra_info_memory): local_extra_info[i] = shared_extra_info[i] local_legal_actions_with_count[-1] = 0 turn = trees_turns[ti, to_be_played_out] outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) if last_action != int16(-1) else int8(2) # else case only when trees not grown due to memory limit (then selected played out) while True: # playout loop if not (outcome == int8(-1) or outcome == int8(0) or outcome == int8(1)): # indecisive, game ongoing legal_actions_playout(m, n, local_board, local_extra_info, turn, local_legal_actions_with_count) count = local_legal_actions_with_count[-1] action_ord = int16(xoroshiro128p_uniform_float32(random_generators_playout, t_global) * count) last_action = local_legal_actions_with_count[action_ord] take_action_playout(m, n, local_board, local_extra_info, turn, last_action, action_ord, local_legal_actions_with_count) turn = -turn else: if outcome != int8(0): shared_playout_outcomes[t, (outcome + 1) // 2] = int8(1) break outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # sum reduction pattern if t < stride: t_stride = t + stride shared_playout_outcomes[t, 0] += shared_playout_outcomes[t_stride, 0] shared_playout_outcomes[t, 1] += shared_playout_outcomes[t_stride, 1] cuda.syncthreads() stride >>= 1 if t == 0: trees_playout_outcomes_children[ti, action, 0] = shared_playout_outcomes[0, 0] trees_playout_outcomes_children[ti, action, 1] = shared_playout_outcomes[0, 1]
[docs] @staticmethod @cuda.jit(void(int16, int32[:, :, :], int8[:, :], int32[:, :], int32[:, :], int32[:], int32[:, :], int16[:, :], int32[:, :])) def _backup_ocp(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_selected_paths, trees_actions_expanded, trees_playout_outcomes): """CUDA kernel responsible for computations of stage: backups (variant ``"ocp_thrifty"`` or ``"ocp_prodigal"``).""" ti = cuda.blockIdx.x t = cuda.threadIdx.x tpb = cuda.blockDim.x n_negative_wins = trees_playout_outcomes[ti, 0] n_positive_wins = trees_playout_outcomes[ti, 1] path_length = trees_selected_paths[ti, -1] pept = (path_length + tpb - 1) // tpb # path elements per thread e = t for _ in range(pept): if e < path_length: node = trees_selected_paths[ti, e] trees_ns[ti, node] += n_playouts if trees_turns[ti, node] == int8(1): trees_ns_wins[ti, node] += n_negative_wins else: trees_ns_wins[ti, node] += n_positive_wins e += tpb if t == 0: node = trees_nodes_selected[ti] rand_child_for_playout = trees_actions_expanded[ti, -2] if rand_child_for_playout != int16(-1): # check if some child picked on random for playouts last_action = trees_actions_expanded[ti, rand_child_for_playout] node = trees[ti, node, 1 + last_action] trees_ns[ti, node] += n_playouts if trees_turns[ti, node] == int8(1): trees_ns_wins[ti, node] += n_negative_wins else: trees_ns_wins[ti, node] += n_positive_wins
[docs] @staticmethod @cuda.jit(void(int16, int32[:, :, :], int8[:, :], int32[:, :], int32[:, :], int32[:], int16[:, :], int32[:, :], int32[:, :, :])) def _backup_1_acp_thrifty(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_actions_expanded, trees_playout_outcomes, trees_playout_outcomes_children): """CUDA kernel responsible for computations of stage: backups (substage 1, variant ``"acp_thrifty"``).""" shared_playout_outcomes_children = cuda.shared.array((512, 2), dtype=int32) # 512 - assumed limit on max actions, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x fake_child_for_playout = trees_actions_expanded[ti, -2] if fake_child_for_playout == int16(-2): # check if actual children of selected were played out selected = trees_nodes_selected[ti] n_expanded_actions = trees_actions_expanded[ti, -1] if t < n_expanded_actions: a = trees_actions_expanded[ti, t] n_negative_wins = trees_playout_outcomes_children[ti, a, 0] n_positive_wins = trees_playout_outcomes_children[ti, a, 1] child_node = trees[ti, selected, 1 + a] trees_ns[ti, child_node] += n_playouts if trees_turns[ti, child_node] == int8(1): trees_ns_wins[ti, child_node] += n_negative_wins else: trees_ns_wins[ti, child_node] += n_positive_wins shared_playout_outcomes_children[t, 0] = n_negative_wins shared_playout_outcomes_children[t, 1] = n_positive_wins else: shared_playout_outcomes_children[t, 0] = np.int32(0) shared_playout_outcomes_children[t, 1] = np.int32(0) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # sum reduction pattern if t < stride: t_stride = t + stride shared_playout_outcomes_children[t, 0] += shared_playout_outcomes_children[t_stride, 0] shared_playout_outcomes_children[t, 1] += shared_playout_outcomes_children[t_stride, 1] cuda.syncthreads() stride >>= 1 if t == 0: trees_playout_outcomes[ti, 0] = shared_playout_outcomes_children[0, 0] trees_playout_outcomes[ti, 1] = shared_playout_outcomes_children[0, 1]
[docs] @staticmethod @cuda.jit(void(int16, int32[:, :, :], int8[:, :], int32[:, :], int32[:, :], int32[:], int16[:, :], int32[:, :], int32[:, :, :])) def _backup_1_acp_prodigal(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_actions_expanded, trees_playout_outcomes, trees_playout_outcomes_children): """CUDA kernel responsible for computations of stage: backups (substage 1, variant ``"acp_prodigal"``).""" shared_playout_outcomes_children = cuda.shared.array((512, 2), dtype=int32) # 512 - assumed limit on max actions, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout ti = cuda.blockIdx.x # tree index tpb = cuda.blockDim.x t = cuda.threadIdx.x fake_child_for_playout = trees_actions_expanded[ti, -2] max_actions = trees_actions_expanded.shape[1] - 2 if fake_child_for_playout == int16(-2): # check if actual children of selected were played out selected = trees_nodes_selected[ti] if t < max_actions and trees_actions_expanded[ti, t] != int16(-1): # prodigality a = t n_negative_wins = trees_playout_outcomes_children[ti, a, 0] n_positive_wins = trees_playout_outcomes_children[ti, a, 1] child_node = trees[ti, selected, 1 + a] trees_ns[ti, child_node] += n_playouts if trees_turns[ti, child_node] == int8(1): trees_ns_wins[ti, child_node] += n_negative_wins else: trees_ns_wins[ti, child_node] += n_positive_wins shared_playout_outcomes_children[t, 0] = n_negative_wins shared_playout_outcomes_children[t, 1] = n_positive_wins else: shared_playout_outcomes_children[t, 0] = np.int32(0) shared_playout_outcomes_children[t, 1] = np.int32(0) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # sum reduction pattern if t < stride: t_stride = t + stride shared_playout_outcomes_children[t, 0] += shared_playout_outcomes_children[t_stride, 0] shared_playout_outcomes_children[t, 1] += shared_playout_outcomes_children[t_stride, 1] cuda.syncthreads() stride >>= 1 if t == 0: trees_playout_outcomes[ti, 0] = shared_playout_outcomes_children[0, 0] trees_playout_outcomes[ti, 1] = shared_playout_outcomes_children[0, 1]
[docs] @staticmethod @cuda.jit(void(int16, int8[:, :], int32[:, :], int32[:, :], int32[:, :], int16[:, :], int32[:, :])) def _backup_2_acp(n_playouts, trees_turns, trees_ns, trees_ns_wins, trees_selected_paths, trees_actions_expanded, trees_playout_outcomes): """CUDA kernel responsible for computations of stage: backups (substage 2, variant ``"acp_thrifty"`` or ``"acp_prodigal"``).""" ti = cuda.blockIdx.x t = cuda.threadIdx.x tpb = cuda.blockDim.x n_negative_wins = trees_playout_outcomes[ti, 0] n_positive_wins = trees_playout_outcomes[ti, 1] n_expanded_actions = trees_actions_expanded[ti, -1] if n_expanded_actions == int16(0): # terminal was being "played out" n_expanded_actions = int16(1) n_playouts_total = n_playouts * n_expanded_actions path_length = trees_selected_paths[ti, -1] pept = (path_length + tpb - 1) // tpb # path elements per thread e = t for _ in range(pept): if e < path_length: node = trees_selected_paths[ti, e] trees_ns[ti, node] += n_playouts_total if trees_turns[ti, node] == int8(1): trees_ns_wins[ti, node] += n_negative_wins else: trees_ns_wins[ti, node] += n_positive_wins e += tpb
[docs] @staticmethod @cuda.jit(void(int32[:, :, :], boolean[:, :], int8[:, :], int32[:, :], int32[:, :], int16[:], int8, int64[:], boolean[:], int64[:], int64[:])) def _reduce_over_trees_thrifty(trees, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, root_actions_expanded, root_turn, root_ns, actions_win_flags, actions_ns, actions_ns_wins): """CUDA kernel responsible for sum-reduction over trees (thrifty number of blocks, variant ``ocp_thrifty`` or ``acp_thrifty``).""" shared_root_ns = cuda.shared.array(512, dtype=int64) # 512 - assumed max of n_trees shared_actions_ns = cuda.shared.array(512, dtype=int64) shared_actions_ns_wins = cuda.shared.array(512, dtype=int64) b = cuda.blockIdx.x action = root_actions_expanded[b] # action index n_trees = trees.shape[0] tpb = cuda.blockDim.x t = cuda.threadIdx.x # thread index == tree index if t < n_trees: shared_root_ns[t] = int64(trees_ns[t, 0]) action_node = trees[t, 0, 1 + action] shared_actions_ns[t] = int64(trees_ns[t, action_node]) shared_actions_ns_wins[t] = int64(trees_ns_wins[t, action_node]) else: shared_root_ns[t] = int64(0) shared_actions_ns[t] = int64(0) shared_actions_ns_wins[t] = int64(0) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # sum reduction pattern if t < stride: t_stride = t + stride shared_root_ns[t] += shared_root_ns[t_stride] shared_actions_ns[t] += shared_actions_ns[t_stride] shared_actions_ns_wins[t] += shared_actions_ns_wins[t_stride] cuda.syncthreads() stride >>= 1 if t == 0: root_ns[b] = shared_root_ns[0] actions_ns[b] = shared_actions_ns[0] actions_ns_wins[b] = shared_actions_ns_wins[0] action_node = trees[0, 0, 1 + action] actions_win_flags[b] = action_node != int32(-1) and trees_terminals[0, action_node] and trees_outcomes[0, action_node] == root_turn
[docs] @staticmethod @cuda.jit(void(int32[:, :, :], boolean[:, :], int8[:, :], int32[:, :], int32[:, :], int16[:], int8, int64[:], boolean[:], int64[:], int64[:])) def _reduce_over_trees_prodigal(trees, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, root_actions_expanded, root_turn, root_ns, actions_win_flags, actions_ns, actions_ns_wins): """CUDA kernel responsible for sum-reduction over trees (prodigal number of blocks, variant ``ocp_prodigal`` or ``acp_prodigal``).""" shared_root_ns = cuda.shared.array(512, dtype=int64) # 512 - assumed max of n_trees shared_actions_ns = cuda.shared.array(512, dtype=int64) shared_actions_ns_wins = cuda.shared.array(512, dtype=int64) b = cuda.blockIdx.x action = b # action index t = cuda.threadIdx.x # thread index == tree index shared_root_ns[t] = int64(0) shared_actions_ns[t] = int64(0) shared_actions_ns_wins[t] = int64(0) if root_actions_expanded[action] != int16(-1): n_trees = trees.shape[0] tpb = cuda.blockDim.x if t < n_trees: shared_root_ns[t] = trees_ns[t, 0] action_node = trees[t, 0, 1 + action] shared_actions_ns[t] = trees_ns[t, action_node] shared_actions_ns_wins[t] = trees_ns_wins[t, action_node] cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # sum reduction pattern if t < stride: t_stride = t + stride shared_root_ns[t] += shared_root_ns[t_stride] shared_actions_ns[t] += shared_actions_ns[t_stride] shared_actions_ns_wins[t] += shared_actions_ns_wins[t_stride] cuda.syncthreads() stride >>= 1 if t == 0: root_ns[b] = shared_root_ns[0] actions_ns[b] = shared_actions_ns[0] actions_ns_wins[b] = shared_actions_ns_wins[0] action_node = trees[0, 0, 1 + b] actions_win_flags[b] = action_node != int32(-1) and trees_terminals[0, action_node] and trees_outcomes[0, action_node] == root_turn
[docs] @staticmethod @cuda.jit(void(int16, boolean[:], int64[:], int64[:], int16[:], boolean[:], int64[:], int64[:])) def _reduce_over_actions_thrifty(n_root_actions, actions_win_flags, actions_ns, actions_ns_wins, best_action, best_win_flag, best_n, best_n_wins): """CUDA kernel responsible for max/argmax-reduction over actions (thrifty number of blocks, variant ``ocp_thrifty`` or ``acp_thrifty``).""" shared_actions = cuda.shared.array(512, dtype=int16) # 512 - assumed max state actions shared_actions_win_flags = cuda.shared.array(512, dtype=boolean) shared_actions_ns = cuda.shared.array(512, dtype=int64) shared_actions_ns_wins = cuda.shared.array(512, dtype=int64) tpb = cuda.blockDim.x a = cuda.threadIdx.x # action index shared_actions[a] = a if a < n_root_actions: shared_actions_win_flags[a] = actions_win_flags[a] shared_actions_ns[a] = actions_ns[a] shared_actions_ns_wins[a] = actions_ns_wins[a] else: shared_actions_win_flags[a] = False shared_actions_ns[a] = int64(0) shared_actions_ns_wins[a] = int64(0) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # max-argmax reduction pattern if a < stride: a_stride = a + stride if (shared_actions_win_flags[a] < shared_actions_win_flags[a_stride]) or\ ((shared_actions_win_flags[a] == shared_actions_win_flags[a_stride]) and (shared_actions_ns[a] < shared_actions_ns[a_stride])) or\ ((shared_actions_win_flags[a] == shared_actions_win_flags[a_stride]) and (shared_actions_ns[a] == shared_actions_ns[a_stride]) and (shared_actions_ns_wins[a] < shared_actions_ns_wins[a_stride])): shared_actions[a] = shared_actions[a_stride] shared_actions_ns[a] = shared_actions_ns[a_stride] shared_actions_ns_wins[a] = shared_actions_ns_wins[a_stride] shared_actions_win_flags[a] = shared_actions_win_flags[a_stride] cuda.syncthreads() stride >>= 1 if a == 0: best_action[0] = shared_actions[0] best_win_flag[0] = shared_actions_win_flags[0] best_n[0] = shared_actions_ns[0] best_n_wins[0] = shared_actions_ns_wins[0]
[docs] @staticmethod @cuda.jit(void(boolean[:], int64[:], int64[:], int16[:], boolean[:], int64[:], int64[:])) def _reduce_over_actions_prodigal(actions_win_flags, actions_ns, actions_ns_wins, best_action, best_win_flag, best_n, best_n_wins): """CUDA kernel responsible for max/argmax-reduction over actions (prodigal number of blocks, variant ``ocp_prodigal`` or ``acp_prodigal``).""" shared_actions = cuda.shared.array(512, dtype=int16) # 512 - assumed max state actions shared_actions_win_flags = cuda.shared.array(512, dtype=boolean) shared_actions_ns = cuda.shared.array(512, dtype=int64) shared_actions_ns_wins = cuda.shared.array(512, dtype=int64) tpb = cuda.blockDim.x a = cuda.threadIdx.x # action index shared_actions[a] = a state_max_actions = actions_ns.size if a < state_max_actions: shared_actions_win_flags[a] = actions_win_flags[a] shared_actions_ns[a] = actions_ns[a] shared_actions_ns_wins[a] = actions_ns_wins[a] else: shared_actions_win_flags[a] = False shared_actions_ns[a] = int64(0) shared_actions_ns_wins[a] = int64(0) cuda.syncthreads() stride = tpb >> 1 # half of tpb while stride > 0: # max-argmax reduction pattern if a < stride: a_stride = a + stride if (shared_actions_win_flags[a] < shared_actions_win_flags[a_stride]) or\ ((shared_actions_win_flags[a] == shared_actions_win_flags[a_stride]) and (shared_actions_ns[a] < shared_actions_ns[a_stride])) or\ ((shared_actions_win_flags[a] == shared_actions_win_flags[a_stride]) and (shared_actions_ns[a] == shared_actions_ns[a_stride]) and (shared_actions_ns_wins[a] < shared_actions_ns_wins[a_stride])): shared_actions[a] = shared_actions[a_stride] shared_actions_ns[a] = shared_actions_ns[a_stride] shared_actions_ns_wins[a] = shared_actions_ns_wins[a_stride] shared_actions_win_flags[a] = shared_actions_win_flags[a_stride] cuda.syncthreads() stride >>= 1 if a == 0: best_action[0] = shared_actions[0] best_win_flag[0] = shared_actions_win_flags[0] best_n[0] = shared_actions_ns[0] best_n_wins[0] = shared_actions_ns_wins[0]
[docs] def _json_dump(self, fname): """Dumps (saves) device-side arrays, copied to host, representing trees and MCTS elements from the last run to a text file in json format.""" if self.verbose_info: print(f"JSON DUMP... [to file: {fname}]") t1 = time.time() d = {} d["n_trees"] = self.n_trees d["n_playouts"] = self.n_playouts d["variant"] = self.variant d["search_time_limit"] = self.search_time_limit if self.search_time_limit < np.inf else "inf" d["search_steps_limit"] = self.search_steps_limit if self.search_steps_limit < np.inf else "inf" d["ucb_c"] = self.ucb_c d["seed"] = self.seed d["device_memory"] = self.device_memory trees_sizes = np.empty_like(self.dev_trees_sizes) self.dev_trees_sizes.copy_to_host(ary=trees_sizes) tree_size_max = np.max(trees_sizes) trees = np.empty_like(self.dev_trees) self.dev_trees.copy_to_host(ary=trees) trees = trees[:, :tree_size_max, :] trees_depths = np.empty_like(self.dev_trees_depths) self.dev_trees_depths.copy_to_host(ary=trees_depths) trees_depths = trees_depths[:, :tree_size_max] depth_max = -np.inf for i in range(self.n_trees): depth_max = max(depth_max, np.max(trees_depths[i, :trees_sizes[i]])) trees_turns = np.empty_like(self.dev_trees_turns) self.dev_trees_turns.copy_to_host(ary=trees_turns) trees_turns = trees_turns[:, :tree_size_max] trees_ns = np.empty_like(self.dev_trees_ns) self.dev_trees_ns.copy_to_host(ary=trees_ns) trees_ns = trees_ns[:, :tree_size_max] trees_ns_wins = np.empty_like(self.dev_trees_ns_wins) self.dev_trees_ns_wins.copy_to_host(ary=trees_ns_wins) trees_ns_wins = trees_ns_wins[:, :tree_size_max] trees_nodes_selected = np.empty_like(self.dev_trees_nodes_selected) self.dev_trees_nodes_selected.copy_to_host(ary=trees_nodes_selected) trees_selected_paths = np.empty_like(self.dev_trees_selected_paths) self.dev_trees_selected_paths.copy_to_host(ary=trees_selected_paths) tmp_trees_selected_paths = trees_selected_paths[:, :depth_max + 2]; tmp_trees_selected_paths[:, -1] = trees_selected_paths[:, -1] trees_selected_paths = tmp_trees_selected_paths trees_actions_expanded = np.empty_like(self.dev_trees_actions_expanded) self.dev_trees_actions_expanded.copy_to_host(ary=trees_actions_expanded) trees_playout_outcomes = np.empty_like(self.dev_trees_playout_outcomes) self.dev_trees_playout_outcomes.copy_to_host(ary=trees_playout_outcomes) trees_playout_outcomes_children = None if self.dev_trees_playout_outcomes_children is not None: trees_playout_outcomes_children = np.empty_like(self.dev_trees_playout_outcomes_children) self.dev_trees_playout_outcomes_children.copy_to_host(ary=trees_playout_outcomes_children) d["trees"] = trees.tolist() d["trees"] = trees.tolist() d["trees_sizes"] = trees_sizes.tolist() d["trees_depths"] = trees_depths.tolist() d["trees_turns"] = trees_turns.tolist() d["trees_ns"] = trees_ns.tolist() d["trees_ns_wins"] = trees_ns_wins.tolist() d["trees_nodes_selected"] = trees_nodes_selected.tolist() d["trees_selected_paths"] = trees_selected_paths.tolist() d["trees_actions_expanded"] = trees_actions_expanded.tolist() d["trees_playout_outcomes"] = trees_playout_outcomes.tolist() if trees_playout_outcomes_children is not None: d["trees_playout_outcomes_children"] = trees_playout_outcomes_children.tolist() try: f = open(fname, "w+") json.dump(d, f, indent=2) f.close() except IOError: sys.exit(f"[error occurred when trying to dump MCTSNC as json to file: {fname}]") t2 = time.time() if self.verbose_info: print(f"JSON DUMP DONE. [time: {t2 - t1} s]")