"""
This module contains the core algorithmic functionalities of the project, embodied by the class ``MCTSNC``.
With CUDA computational model in mind, we have proposed and implemented in class ``MCTSNC`` four, fast operating and thoroughly parallel, variants of Monte Carlo Tree Search algorithm.
The provided implementation takes advantage of `Numba <https://numba.pydata.org>`_, a just-in-time Python compiler, and its ``numba.cuda`` package (hence the "NC" suffix in the name).
By `thoroughly parallel` we understand an algorithmic design that applies to both: (1) the structural elements of trees - leaf-/root-/tree-level parallelization
(all those three are combined), and (2) the stages of MCTS - each stage in itself (selection, expansion, playouts, backup) employs multiple GPU threads.
We apply suitable `reduction` patterns to carry out summations or max / argmax operations. Cooperation of threads helps to transfer information between global and shared memory.
The implementation uses: no atomic operations, no mutexes (lock-free), and very few device-host memory transfers.
Example usage 1 (Connect 4)
---------------------------
Assume the mechanics of the Connect 4 game have been defined to MCTS-NC in ``mctsnc_game_mechanics.py`` (via device functions ``is_action_legal``, ``take_action``, etc.),
and that ``c4`` - instance of ``C4(State)`` - represents a state of an ongoing Connect 4 game shown below.
.. code-block:: console
|.|.|●|○|.|.|.|
|.|.|●|○|.|.|○|
|.|.|●|●|.|●|●|
|.|●|○|●|.|○|●|
|.|○|●|○|.|●|○|
|○|○|○|●|●|○|○|
0 1 2 3 4 5 6
Then, running the following code
.. code-block:: python
ai = MCTSNC(C4.get_board_shape(), C4.get_extra_info_memory(), C4.get_max_actions())
ai.init_device_side_arrays()
best_action = ai.run(c4.get_board(), c4.get_extra_info(), c4.get_turn())
print(f"BEST ACTION: {best_action}")
results in finding the best action for black - move 4 (winning in two plies), and the following printout:
.. code-block:: console
[MCTSNC._init_device_side_arrays()... for MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[MCTSNC._init_device_side_arrays() done; time: 0.5193691253662109 s, per_state_memory: 95 B, calculated max_tree_size: 2825549]
MCTSNC RUN... [MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[actions info:
{
0: {'name': '0', 'n_root': 7474304, 'win_flag': False, 'n': 2182400, 'n_wins': 2100454, 'q': 0.9624514296187683, 'ucb': 0.9678373740384631},
1: {'name': '1', 'n_root': 7474304, 'win_flag': False, 'n': 185344, 'n_wins': 164757, 'q': 0.8889254575276243, 'ucb': 0.9074070665330406},
4: {'name': '4', 'n_root': 7474304, 'win_flag': False, 'n': 4921472, 'n_wins': 4885924, 'q': 0.9927769577882389, 'ucb': 0.9963635461474457},
5: {'name': '5', 'n_root': 7474304, 'win_flag': False, 'n': 105472, 'n_wins': 91863, 'q': 0.8709704945388349, 'ucb': 0.8954701768685893},
6: {'name': '6', 'n_root': 7474304, 'win_flag': False, 'n': 79616, 'n_wins': 68403, 'q': 0.8591614750803859, 'ucb': 0.8873601607647162},
best: {'index': 4, 'name': '4', 'n_root': 7474304, 'win_flag': False, 'n': 4921472, 'n_wins': 4885924, 'q': 0.9927769577882389, 'ucb': 0.9963635461474457}
}]
[performance info:
{
steps: 6373,
steps_per_second: 1274.0076324260813,
playouts: 7474304,
playouts_per_second: 1494166.0666990099,
times_[ms]: {'total': 5002.324819564819, 'loop': 5000.642776489258, 'reduce_over_trees': 0.29015541076660156, 'reduce_over_actions': 0.4520416259765625, 'mean_loop': 0.7846607212441955, 'mean_select': 0.11222893376562147, 'mean_expand': 0.2786097114284054, 'mean_playout': 0.17186361935680036, 'mean_backup': 0.2193056618645448},
trees: {'count': 8, 'mean_depth': 5.176703163017032, 'max_depth': 12, 'mean_size': 1233.0, 'max_size': 2736}
}]
MCTSNC RUN DONE. [time: 5.002324819564819 s; best action: 4, best win_flag: False, best n: 4921472, best n_wins: 4885924, best q: 0.9927769577882389]
BEST ACTION: 4
Example usage 2 (Gomoku)
------------------------
Assume the mechanics of the Gomoku game have been defined to MCTS-NC in ``mctsnc_game_mechanics.py`` (via device functions ``is_action_legal``, ``take_action``, etc.),
and that ``g`` - instance of ``Gomoku(State)`` - represents a state of an ongoing Gomoku game shown below.
.. code-block:: console
ABCDEFGHIJKLMNO
15+++++++++++++++15
14+++++++++++++++14
13+++++++++++++++13
12++++++++●++++++12
11++++++++○++++++11
10++++++++○++++++10
9++++++○+○++++++9
8+++++++●○++++++8
7+++++++●●●○++++7
6++++++++●●○++++6
5+++++++●+++++++5
4+++++++++++++++4
3+++++++++++++++3
2+++++++++++++++2
1+++++++++++++++1
ABCDEFGHIJKLMNO
Then, running the following code
.. code-block:: python
ai = MCTSNC(Gomoku.get_board_shape(), Gomoku.get_extra_info_memory(), Gomoku.get_max_actions(), action_index_to_name_function=Gomoku.action_index_to_name)
ai.init_device_side_arrays()
best_action = ai.run(g.get_board(), g.get_extra_info(), g.get_turn())
print(f"BEST ACTION: {best_action}")
results in finding the defensive action for white - move K8 (indexed as 115) that prevents black from winning in three plies, and the following printout:
.. code-block:: console
[MCTSNC._init_device_side_arrays()... for MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[MCTSNC._init_device_side_arrays() done; time: 0.5558419227600098 s, per_state_memory: 1144 B, calculated max_tree_size: 234637]
MCTSNC RUN... [MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[actions info:
{
0: {'name': 'A1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 148906, 'q': 0.3478852048444976, 'ucb': 0.36098484108863044},
1: {'name': 'B1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 149000, 'q': 0.34810481459330145, 'ucb': 0.3612044508374343},
2: {'name': 'C1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 144339, 'q': 0.3372154418361244, 'ucb': 0.35031507808025725},
...
115: {'name': 'K8', 'n_root': 94359552, 'win_flag': False, 'n': 1093632, 'n_wins': 452284, 'q': 0.41356141736891383, 'ucb': 0.4217566587685248},
...
222: {'name': 'M15', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 148009, 'q': 0.34578956713516745, 'ucb': 0.3588892033793003},
223: {'name': 'N15', 'n_root': 94359552, 'win_flag': False, 'n': 401408, 'n_wins': 148802, 'q': 0.37070013552295916, 'ucb': 0.38422722440183954},
224: {'name': 'O15', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 145329, 'q': 0.3395283530203349, 'ucb': 0.35262798926446776},
best: {'index': 115, 'name': 'K8', 'n_root': 94359552, 'win_flag': False, 'n': 1093632, 'n_wins': 452284, 'q': 0.41356141736891383, 'ucb': 0.4217566587685248}
}]
[performance info:
{
steps: 442,
steps_per_second: 88.25552729358726,
playouts: 94359552,
playouts_per_second: 18841067.91164404,
times_[ms]: {'total': 5008.184909820557, 'loop': 5006.503105163574, 'reduce_over_trees': 0.20575523376464844, 'reduce_over_actions': 0.5161762237548828, 'mean_loop': 11.326930102180032, 'mean_select': 0.10066766005295974, 'mean_expand': 0.3082833139065704, 'mean_playout': 10.688265524298897, 'mean_backup': 0.226746317488036},
trees: {'count': 8, 'mean_depth': 2.519115779878241, 'max_depth': 3, 'mean_size': 92149.0, 'max_size': 92149}
}]
MCTSNC RUN DONE. [time: 5.008184909820557 s; best action: 115 (K8), best win_flag: False, best n: 1093632, best n_wins: 452284, best q: 0.41356141736891383]
BEST ACTION: 115
Dependencies
------------
- ``numpy``, ``math``: required for mathematical computations.
- ``numba``: required for just-in-time compilation of CUDA kernels (decorated by ``@cuda.jit``).
- ``mctsnc_game_mechanics``: required to define the mechanics of a wanted game or search problem via a set of five device-side functions - ``is_action_legal``, ``take_action``, ``legal_actions_playout``, ``take_action_playout``, ``compute_outcome`` callable by kernel functions of ``MCTSNC`` (see :doc:`mctsnc_game_mechanics`).
- For usage of ``MCTSNC`` class, NVIDIA CUDA drivers must be present in the operating system.
Link to project repository
--------------------------
`https://github.com/pklesk/mcts_numba_cuda <https://github.com/pklesk/mcts_numba_cuda>`_
Notes
-----
Private functions of ``MCTSNC`` class are named with a single leading underscore (e.g.: ``_set_cuda_constants``,
``_make_performance_info``, ``_playout_acp_prodigal``, etc.). Among them, the kernel functions are additionally
described by ``@cuda.jit`` decorators coming from ``numba`` module. Exact specifications of types come along with the decorators.
For public methods full docstrings are provided (with arguments and returns described). For private functions short docstrings are provided.
"""
import numpy as np
from numpy import inf
from numba import cuda
from numba import void, int8, int16, int32, int64, float32, boolean
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32, xoroshiro128p_type
import time
import math
from numba.core.errors import NumbaPerformanceWarning
import warnings
from mctsnc_game_mechanics import is_action_legal, take_action, legal_actions_playout, take_action_playout, compute_outcome
from utils import dict_to_str
import json
__version__ = "1.0.0"
__author__ = "Przemysław Klęsk"
__email__ = "pklesk@zut.edu.pl"
warnings.simplefilter("ignore", category=NumbaPerformanceWarning)
# the class
[docs]
class MCTSNC:
"""
Monte Carlo Tree Search implemented via ``numba.cuda`` meant for multi-threaded executions on GPU involving multiple concurrent trees and playouts (four algorithmic variants available).
"""
# constants
VARIANTS = ["ocp_thrifty", "ocp_prodigal", "acp_thrifty", "acp_prodigal"] # ocp - one child playouts, acp - all children playouts; thrifty/prodigal - accurate/overhead usage of cuda blocks (pertains to expanded actions)
DEFAULT_SEARCH_TIME_LIMIT = 5.0 # [s], np.inf possible
DEFAULT_SEARCH_STEPS_LIMIT = np.inf # integer, np.inf possible
DEFAULT_N_TREES = 8
DEFAULT_N_PLAYOUTS = 128
DEFAULT_VARIANT = VARIANTS[-1]
DEFAULT_DEVICE_MEMORY = 2.0
DEFAULT_UCB_C = 2.0
DEFAULT_SEED = 0
DEFAULT_VERBOSE_DEBUG = False
DEFAULT_VERBOSE_INFO = True
MAX_STATE_BOARD_SHAPE = (32, 32)
MAX_STATE_EXTRA_INFO_MEMORY = 4096
MAX_STATE_MAX_ACTIONS = 512
MAX_TREE_SIZE = 2**24
MAX_N_TREES = 512
MAX_N_PLAYOUTS = 512
MAX_TREE_DEPTH = 2048 # to memorize paths at select stage
[docs]
def __init__(self, state_board_shape, state_extra_info_memory, state_max_actions,
search_time_limit=DEFAULT_SEARCH_TIME_LIMIT, search_steps_limit=DEFAULT_SEARCH_STEPS_LIMIT,
n_trees=DEFAULT_N_TREES, n_playouts=DEFAULT_N_PLAYOUTS, variant=DEFAULT_VARIANT, device_memory=DEFAULT_DEVICE_MEMORY,
ucb_c=DEFAULT_UCB_C, seed=DEFAULT_SEED,
verbose_debug=DEFAULT_VERBOSE_DEBUG, verbose_info=DEFAULT_VERBOSE_INFO,
action_index_to_name_function=None):
"""
Constructor of ``MCTSNC`` instances.
Args:
state_board_shape (tuple(int, int)):
shape of board for states in a given game, at most ``(32, 32)``.
state_extra_info_memory (int):
number of bytes for extra information on states, at most ``4096``.
state_max_actions (int):
maximum branching factor, at most ``512``.
search_time_limit (float):
time limit in seconds (computational budget), ``np.inf`` if no limit, defaults to ``5.0``.
search_steps_limit (float):
steps limit (computational budget), ``np.inf`` if no limit, defaults to ``np.inf``.
n_trees (int):
number of independent trees, defaults to ``8``.
n_playouts (int):
number of independent playouts from an expanded child, must be a power of 2, defaults to ``128``.
variant (str):
choice of algorithmic variant from {``"ocp_thrifty"``, ``"ocp_prodigal"``, ``"acp_thrifty``, ``"acp_prodigal``}, defaults to ``"acp_prodigal"``.
device_memory (float):
GPU memory in GiBs (gibibytes) to be available for this instance, defaults to ``2.0``.
ucb_c (float):
value of C constant, influencing exploration tendency, appearing in UCT formula (upper confidence bounds for trees), defaults to ``2.0``.
verbose_debug (bool):
debug verbosity flag, if ``True`` then detailed information about each kernel invocation are printed to console (in each iteration), defaults to ``False``.
verbose_info (bool):
verbosity flag, if ``True`` then standard information on actions and performance are printed to console (after a full run), defaults to ``True``.
action_index_to_name_function (callable):
pointer to user-provided function converting action indexes to a human-friendly names (e.g. ``"e2:e4"`` for chess), defaults to ``None``.
"""
self._set_cuda_constants()
if not self.cuda_available:
sys.exit(f"[MCTSNC.__init__(): exiting due to cuda computations not available]")
self.state_board_shape = state_board_shape
if self.state_board_shape[0] > self.MAX_STATE_BOARD_SHAPE[0] or self.state_board_shape[1] > self.MAX_STATE_BOARD_SHAPE[1]:
sys.exit(f"[MCTSNC.__init__(): exiting due to allowed state board shape exceeded]")
self.state_extra_info_memory = max(state_extra_info_memory, 1)
if self.state_extra_info_memory > self.MAX_STATE_EXTRA_INFO_MEMORY:
sys.exit(f"[MCTSNC.__init__(): exiting due to allowed state extra info memory exceeded]")
self.state_max_actions = state_max_actions
if self.state_max_actions > self.MAX_STATE_MAX_ACTIONS:
sys.exit(f"[MCTSNC.__init__(): exiting due to allowed state max actions memory exceeded]")
if self.state_max_actions > self.cuda_tpb_default:
sys.exit(f"[MCTSNC.__init__(): exiting due to state max actions exceeding half of cuda default tpb]")
self.search_time_limit = search_time_limit
self._validate_param("search_time_limit", float, True, 0.0, False, np.inf, self.DEFAULT_SEARCH_TIME_LIMIT)
self.search_steps_limit = float(search_steps_limit)
self._validate_param("search_steps_limit", float, True, 0.0, False, np.inf, self.DEFAULT_SEARCH_STEPS_LIMIT) # purposely float, so that np.inf possible
self.n_trees = n_trees
self._validate_param("n_trees", int, False, 1, False, self.MAX_N_TREES, self.DEFAULT_N_TREES)
if 2**np.round(np.log2(n_playouts)) != n_playouts:
invalid_n_playouts = n_playouts
n_playouts = self.DEFAULT_N_PLAYOUTS
print(f"[n_playouts: {invalid_n_playouts} is not a power of 2; changed to default: {n_playouts}]")
self.n_playouts = n_playouts
self._validate_param("n_playouts", int, False, 1, False, self.MAX_N_PLAYOUTS, self.DEFAULT_N_PLAYOUTS)
if not variant in self.VARIANTS:
invalid_variant = variant
variant = self.DEFAULT_VARIANT
print(f"[invalid variant: '{invalid_variant}' changed to default: '{variant}'; possible variants: {self.VARIANTS}]")
self.variant = variant
self.ucb_c = ucb_c
self._validate_param("ucb_c", float, False, 0.0, False, np.inf, self.DEFAULT_UCB_C)
self.device_memory = device_memory * 1024**3 # gibibytes (GiB) to bytes (B)
self._validate_param("device_memory", float, True, 0.0, False, np.inf, self.DEFAULT_DEVICE_MEMORY)
self.seed = seed
self.verbose_debug = verbose_debug
self._validate_param("verbose_debug", bool, False, False, False, True, self.DEFAULT_VERBOSE_DEBUG)
self.verbose_info = verbose_info
self._validate_param("verbose_info", bool, False, False, False, True, self.DEFAULT_VERBOSE_INFO)
self.action_index_to_name_function = action_index_to_name_function
[docs]
def _set_cuda_constants(self):
"""Investigates (via ``numba`` module) if CUDA-based computations are available and, if so, sets suitable constants."""
self.cuda_available = cuda.is_available()
self.cuda_tpb_default = cuda.get_current_device().MAX_THREADS_PER_BLOCK // 2 if self.cuda_available else None
[docs]
def _validate_param(self, name, ptype, leq, low, geq, high, default):
"""Validates a parameter - is it of correct type and within given range (either end of the range can be open or closed)."""
value = getattr(self, name)
invalid = value <= low if leq else value < low
if not invalid:
invalid = value >= high if geq else value > high
if not invalid:
invalid = not isinstance(value, ptype)
if invalid:
low_str = str(low)
high_str = str(high)
correct_range_str = ("(" if leq else "[") + f"{low_str}, {high_str}" + (")" if geq else "]")
setattr(self, name, default)
print(f"[invalid param {name}: {value} changed to default: {default}; correct range: {correct_range_str}, correct type: {ptype}]")
[docs]
def __str__(self):
"""
Returns a string representation of this ``MCTSNC`` instance.
Returns:
str: string representation of this ``MCTSNC`` instance.
"""
return f"MCTSNC(search_time_limit={self.search_time_limit}, search_steps_limit={self.search_steps_limit}, n_trees={self.n_trees}, n_playouts={self.n_playouts}, variant='{self.variant}', device_memory={np.round(self.device_memory / 1024**3, 2)}, ucb_c={self.ucb_c}, seed: {self.seed})"
[docs]
def __repr__(self):
"""
Returns a detailed string representation of this ``MCTSNC`` instance.
Returns:
str: detailed string representation of this ``MCTSNC`` instance.
"""
repr_str = f"{str(self)}, "
repr_str += f"state_board_shape={self.state_board_shape}, state_extra_info_memory={self.state_extra_info_memory}, state_max_actions={self.state_max_actions})"
return repr_str
[docs]
def init_device_side_arrays(self):
"""
Allocates all the necessary device arrays based on relevant constants and available memory.
"""
if self.verbose_info:
print(f"[MCTSNC._init_device_side_arrays()... for {self}]")
t1_dev_arrays = time.time()
# dtypes
node_index_dtype = np.int32
node_index_bytes = node_index_dtype().itemsize # 4 B
action_index_dtype = np.int16
action_index_bytes = action_index_dtype().itemsize # 2 B
board_element_dtype = np.int8
board_element_bytes = board_element_dtype().itemsize # 1 B
extra_info_element_dtype = np.int8
extra_info_element_bytes = extra_info_element_dtype().itemsize # 1 B
depth_dtype = np.int16
depth_bytes = depth_dtype().itemsize # 2 B
size_dtype = np.int32
size_bytes = size_dtype().itemsize # 4 B
turn_dtype = np.int8
turn_bytes = turn_dtype().itemsize # 1 B
flag_dtype = bool
flag_bytes = 1 # 1 B
outcome_dtype = np.int8
outcome_bytes = outcome_dtype().itemsize # 1 B
playout_outcomes_dtype = np.int32
playout_outcomes_bytes = playout_outcomes_dtype().itemsize # 4 B
ns_dtype = np.int32
ns_bytes = ns_dtype().itemsize # 4 B
ns_extended_dtype = np.int64
# memory related calculations
per_state_additional_memory = depth_bytes + turn_bytes + 2 * flag_bytes + outcome_bytes + 2 * ns_bytes # depth, turn, leaf, terminal, ouctome, ns, ns_wins
per_tree_additional_memory = size_bytes + node_index_bytes + action_index_bytes * (self.state_max_actions + 2) + playout_outcomes_bytes * 2 \
+ node_index_bytes * (self.MAX_TREE_DEPTH + 2) # tree size, tree node selected, tree actions expanded * (self.state_max_actions + 2), playout outcomes * 2, selected path
if "acp" in self.variant: # playout all children
per_tree_additional_memory += playout_outcomes_bytes * self.state_max_actions * 2 # playout children outcomes
per_state_memory = board_element_bytes * np.prod(self.state_board_shape) + extra_info_element_bytes * self.state_extra_info_memory \
+ node_index_bytes * (1 + self.state_max_actions) + per_state_additional_memory # board, extra info, tree array entry (parent, children nodes), additional memory
self.max_tree_size = (int(self.device_memory) - self.n_trees * per_tree_additional_memory) // (per_state_memory * self.n_trees)
self.max_tree_size = min(self.max_tree_size, self.MAX_TREE_SIZE)
# tpb
tpb_board = int(2**np.ceil(np.log2(np.prod(self.state_board_shape))))
tpb_extra_info = int(2**np.ceil(np.log2(self.state_extra_info_memory))) if self.state_extra_info_memory > 0 else 1
tpb_max_actions = int(2**np.ceil(np.log2(self.state_max_actions)))
self.tpb_r = min(max(tpb_board, tpb_extra_info), self.cuda_tpb_default)
self.tpb_s = self.cuda_tpb_default
self.tpb_e1 = min(max(self.tpb_r, tpb_max_actions), self.cuda_tpb_default)
self.tpb_e2 = self.tpb_r
self.tpb_b1 = tpb_max_actions
self.tpb_b2 = self.cuda_tpb_default
self.tpb_rot = int(2**np.ceil(np.log2(self.n_trees))) # rot - reduce over trees
self.tpb_roa = tpb_max_actions # roa - reduce over actions
# device arrays
self.dev_trees = cuda.device_array((self.n_trees, self.max_tree_size, 1 + self.state_max_actions), dtype=node_index_dtype) # each row of a tree represents a node consisting of: parent indexes and indexes of all children (associated with actions), -1 index for none parent or child
self.dev_trees_sizes = cuda.device_array(self.n_trees, dtype=size_dtype)
self.dev_trees_depths = cuda.device_array((self.n_trees, self.max_tree_size), dtype=depth_dtype)
self.dev_trees_turns = cuda.device_array((self.n_trees, self.max_tree_size), dtype=turn_dtype)
self.dev_trees_leaves = cuda.device_array((self.n_trees, self.max_tree_size), dtype=flag_dtype)
self.dev_trees_terminals = cuda.device_array((self.n_trees, self.max_tree_size), dtype=flag_dtype)
self.dev_trees_outcomes = cuda.device_array((self.n_trees, self.max_tree_size), dtype=outcome_dtype)
self.dev_trees_ns = cuda.device_array((self.n_trees, self.max_tree_size), dtype=ns_dtype)
self.dev_trees_ns_wins = cuda.device_array((self.n_trees, self.max_tree_size), dtype=ns_dtype)
self.dev_trees_boards = cuda.device_array((self.n_trees, self.max_tree_size, self.state_board_shape[0], self.state_board_shape[1]), dtype=board_element_dtype)
self.dev_trees_extra_infos = cuda.device_array((self.n_trees, self.max_tree_size, self.state_extra_info_memory), dtype=extra_info_element_dtype)
self.dev_trees_nodes_selected = cuda.device_array(self.n_trees, dtype=node_index_dtype)
self.dev_trees_selected_paths = cuda.device_array((self.n_trees, self.MAX_TREE_DEPTH + 2), dtype=node_index_dtype)
self.dev_trees_actions_expanded = cuda.device_array((self.n_trees, self.state_max_actions + 2), dtype=action_index_dtype) # +2 because 2 last entries inform about: child picked randomly for playouts, number of actions (children) expanded
self.dev_trees_playout_outcomes = cuda.device_array((self.n_trees, 2), dtype=playout_outcomes_dtype) # each row stores counts of: -1 wins and +1 wins, respectively (for given tree)
self.dev_trees_playout_outcomes_children = None
self.dev_random_generators_expand_1 = None
self.dev_random_generators_playout = None
if "ocp" in self.variant:
self.dev_random_generators_expand_1 = create_xoroshiro128p_states(self.n_trees * self.tpb_e1, seed=self.seed)
self.dev_random_generators_playout = create_xoroshiro128p_states(self.n_trees * self.n_playouts, seed=self.seed)
else: # "acp"
self.dev_random_generators_playout = create_xoroshiro128p_states(self.n_trees * self.state_max_actions * self.n_playouts, seed=self.seed)
self.dev_trees_playout_outcomes_children = cuda.device_array((self.n_trees, self.state_max_actions, 2), dtype=playout_outcomes_dtype) # for each (playable) action, each row stores counts of: -1 wins and +1 wins, respectively (for given tree)
self.dev_root_actions_expanded = cuda.device_array(self.state_max_actions + 2, dtype=action_index_dtype)
self.dev_root_ns = cuda.device_array(self.state_max_actions, dtype=ns_extended_dtype) # all entries the same regardless of root action (overhead for convenience)
self.dev_actions_win_flags = cuda.device_array(self.state_max_actions, dtype=flag_dtype)
self.dev_actions_ns = cuda.device_array(self.state_max_actions, dtype=ns_extended_dtype)
self.dev_actions_ns_wins = cuda.device_array(self.state_max_actions, dtype=ns_extended_dtype)
self.dev_best_action = cuda.device_array(1, dtype=action_index_dtype)
self.dev_best_win_flag = cuda.device_array(1, dtype=flag_dtype)
self.dev_best_n = cuda.device_array(1, dtype=ns_extended_dtype)
self.dev_best_n_wins = cuda.device_array(1, dtype=ns_extended_dtype)
t2_dev_arrays = time.time()
if self.verbose_info:
print(f"[MCTSNC._init_device_side_arrays() done; time: {t2_dev_arrays - t1_dev_arrays} s, per_state_memory: {per_state_memory} B, calculated max_tree_size: {self.max_tree_size}]")
[docs]
def run(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf):
"""
Runs the Monte Carlo Tree Search on GPU involving multiple concurrent trees and playouts.
Computations are carried out according to the formerly chosen algorithmic variant, i.e. one of {``"ocp_thrifty"``, ``"ocp_prodigal"``, ``"acp_thrifty``, ``"acp_prodigal``}, defaults to ``"acp_prodigal"``}.
Args:
root_board (ndarray):
two-dimensional array with board (or other representation) of root state from which the search starts.
root_extra_info (ndarray):
any additional information of root state not implied by the contents of the board itself (e.g. possibilities of castling or en-passant captures in chess, the contract in double dummy bridge, etc.), or technical information useful to generate legal actions faster.
root_turn {-1, 1}:
indicator of the player, minimizing or maximizing, to act first at root state.
forced_search_steps_limit (int):
steps limit used only when reproducing results of a previous experiment; if less than``np.inf`` then has a priority over the standard computational budget given by ``search_time_limit`` and ``search_steps_limit``.
Returns:
self.best_action (int):
best action resulting from search.
"""
print(f"MCTSNC RUN... [{self}]")
run_method = getattr(self, "_run_" + self.variant)
run_method(root_board, root_extra_info, root_turn, forced_search_steps_limit)
best_action_label = str(self.best_action)
if self.action_index_to_name_function is not None:
best_action_label += f" ({self.action_index_to_name_function(self.best_action)})"
print(f"MCTSNC RUN DONE. [time: {self.time_total} s; best action: {best_action_label}, best win_flag: {self.best_win_flag}, best n: {self.best_n}, best n_wins: {self.best_n_wins}, best q: {self.best_q}]")
return self.best_action
[docs]
def _flatten_trees_actions_expanded_thrifty(self, trees_actions_expanded):
"""Uses information from array ``trees_actions_expanded`` of shape ``(self.n_trees, self.state_max_actions + 2)`` and converts it to another array where the number of rows corresponds to the total of expanded legal actions in all trees. Each row contains a pair of indexes for: action and tree. The approach allows to allocate exact number of needed CUDA blocks for further operations."""
actions_expanded_cumsum = np.cumsum(trees_actions_expanded[:, -1])
trees_actions_expanded_flat = -np.ones((actions_expanded_cumsum[-1], 2), dtype=np.int16)
shift = 0
for ti in range(self.n_trees):
s = slice(shift, actions_expanded_cumsum[ti])
trees_actions_expanded_flat[s, 0] = ti
trees_actions_expanded_flat[s, 1] = trees_actions_expanded[ti, :trees_actions_expanded[ti, -1]]
shift = actions_expanded_cumsum[ti]
return trees_actions_expanded_flat
[docs]
def _make_actions_info_thrifty(self):
"""
Prepares and returns a dictionary with information on root actions (using thrifty indexing) implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc.
After the call, available via ``actions_info`` attribute.
"""
root_actions_expanded = np.empty_like(self.dev_root_actions_expanded)
root_ns_thrifty = np.empty_like(self.dev_root_ns)
actions_win_flags_thrifty = np.empty_like(self.dev_actions_win_flags)
actions_ns_thrifty = np.empty_like(self.dev_actions_ns)
actions_ns_wins_thrifty = np.empty_like(self.dev_actions_ns_wins)
self.dev_root_actions_expanded.copy_to_host(ary=root_actions_expanded)
self.dev_root_ns.copy_to_host(ary=root_ns_thrifty)
self.dev_actions_win_flags.copy_to_host(ary=actions_win_flags_thrifty)
self.dev_actions_ns.copy_to_host(ary=actions_ns_thrifty)
self.dev_actions_ns_wins.copy_to_host(ary=actions_ns_wins_thrifty)
actions_info = {}
best_entry = None
n_root_actions = root_actions_expanded[-1]
for i in range(n_root_actions):
entry = {}
a = root_actions_expanded[i] # for thrifty variants
entry["name"] = self.action_index_to_name_function(a) if self.action_index_to_name_function else str(a)
entry["n_root"] = int(root_ns_thrifty[i])
entry["win_flag"] = bool(actions_win_flags_thrifty[i])
entry["n"] = int(actions_ns_thrifty[i])
entry["n_wins"] = int(actions_ns_wins_thrifty[i])
entry["q"] = entry["n_wins"] / entry["n"] if entry["n"] > 0 else np.nan
entry["ucb"] = entry["q"] + self.ucb_c * np.sqrt(np.log(entry["n_root"]) / entry["n"]) if entry["n"] > 0 else np.inf
actions_info[a] = entry
if a == self.best_action:
best_entry = {"index": int(a), **entry}
actions_info["best"] = best_entry
self.actions_info = actions_info
return actions_info
[docs]
def _make_actions_info_prodigal(self):
"""
Prepares and returns a dictionary with information on root actions (using prodigal indexing) implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc.
After the call, available via ``actions_info`` attribute.
"""
root_ns_prodigal = np.empty_like(self.dev_root_ns)
actions_win_flags_prodigal = np.empty_like(self.dev_actions_win_flags)
actions_ns_prodigal = np.empty_like(self.dev_actions_ns)
actions_ns_wins_prodigal = np.empty_like(self.dev_actions_ns_wins)
self.dev_root_ns.copy_to_host(ary=root_ns_prodigal)
self.dev_actions_win_flags.copy_to_host(ary=actions_win_flags_prodigal)
self.dev_actions_ns.copy_to_host(ary=actions_ns_prodigal)
self.dev_actions_ns_wins.copy_to_host(ary=actions_ns_wins_prodigal)
actions_info = {}
best_entry = None
for i in range(self.state_max_actions):
if root_ns_prodigal[i] == 0:
continue
a = i # for prodigal variants
entry = {}
entry["name"] = self.action_index_to_name_function(a) if self.action_index_to_name_function else str(a)
entry["n_root"] = int(root_ns_prodigal[i])
entry["win_flag"] = bool(actions_win_flags_prodigal[i])
entry["n"] = int(actions_ns_prodigal[i])
entry["n_wins"] = int(actions_ns_wins_prodigal[i])
entry["q"] = entry["n_wins"] / entry["n"] if entry["n"] > 0 else np.nan
entry["ucb"] = entry["q"] + self.ucb_c * np.sqrt(np.log(entry["n_root"]) / entry["n"]) if entry["n"] > 0 else np.inf
actions_info[a] = entry
if a == self.best_action:
best_entry = {"index": int(a), **entry}
actions_info["best"] = best_entry
self.actions_info = actions_info
return actions_info
[docs]
def _run_ocp_thrifty(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf):
"""Runs computations for algorithmic variant: ``"ocp_thrifty"``."""
t1 = time.time()
# reset
t1_reset = time.time()
bpg = self.n_trees
tpb = self.tpb_r
dev_root_board = cuda.to_device(root_board)
if root_extra_info is None:
root_extra_info = np.zeros(1, dtype=np.int8) # fake extra info array
dev_root_extra_info = cuda.to_device(root_extra_info)
if self.verbose_debug:
print(f"[MCTSNC._reset()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reset[bpg, tpb](dev_root_board, dev_root_extra_info, root_turn,
self.dev_trees, self.dev_trees_sizes, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_boards, self.dev_trees_extra_infos)
cuda.synchronize()
t2_reset = time.time()
if self.verbose_debug:
print(f"[MCTSNC._reset() done; time: {t2_reset - t1_reset} s]")
self.time_select = 0.0
self.time_expand = 0.0
self.time_playout = 0.0
self.time_backup = 0.0
self.steps = 0
trees_actions_expanded = np.empty((self.n_trees, self.state_max_actions + 2), dtype=np.int16) # needed at host side for thrifty variants
t1_loop = time.time()
while True:
t2_loop = time.time()
if forced_search_steps_limit < np.inf:
if self.steps >= forced_search_steps_limit:
break
elif self.steps >= self.search_steps_limit or t2_loop - t1_loop >= self.search_time_limit:
break
if self.verbose_debug:
print(f"[step: {self.steps + 1} starting, time used so far: {t2_loop - t1_loop} s]")
# selections
t1_select = time.time()
bpg = self.n_trees
tpb = self.tpb_s
if self.verbose_debug:
print(f"[MCTSNC._select()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._select[bpg, tpb](self.ucb_c,
self.dev_trees, self.dev_trees_leaves, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_nodes_selected, self.dev_trees_selected_paths)
cuda.synchronize()
t2_select = time.time()
if self.verbose_debug:
print(f"[MCTSNC._select() done; time: {t2_select - t1_select} s]")
self.time_select += t2_select - t1_select
# expansions
t1_expand = time.time()
t1_expand_1 = time.time()
bpg = self.n_trees
tpb = self.tpb_e1
if self.verbose_debug:
print(f"[MCTSNC._expand_1_ocp_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._expand_1_ocp_thrifty[bpg, tpb](self.max_tree_size,
self.dev_trees, self.dev_trees_sizes, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_random_generators_expand_1, self.dev_trees_actions_expanded)
self.dev_trees_actions_expanded.copy_to_host(ary=trees_actions_expanded)
cuda.synchronize()
if self.steps == 0:
MCTSNC._memorize_root_actions_expanded[1, self.state_max_actions + 2](self.dev_trees_actions_expanded, self.dev_root_actions_expanded)
cuda.synchronize()
t2_expand_1 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._expand_1_ocp_thrifty() done; time: {t2_expand_1 - t1_expand_1} s]")
t1_expand_2 = time.time()
trees_actions_expanded_flat = self._flatten_trees_actions_expanded_thrifty(trees_actions_expanded)
bpg = trees_actions_expanded_flat.shape[0] # thrifty number of blocks
tpb = self.tpb_e2
if self.verbose_debug:
print(f"[MCTSNC._expand_2_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
dev_trees_actions_expanded_flat = cuda.to_device(trees_actions_expanded_flat)
MCTSNC._expand_2_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, dev_trees_actions_expanded_flat)
cuda.synchronize()
t2_expand_2 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._expand_2_thrifty() done; time: {t2_expand_2 - t1_expand_2} s]")
t2_expand = time.time()
self.time_expand += t2_expand - t1_expand
# playouts
t1_playout = time.time()
bpg = self.n_trees
tpb = self.n_playouts
if self.verbose_debug:
print(f"[MCTSNC._playout_ocp()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._playout_ocp[bpg, tpb](self.dev_trees, self.dev_trees_turns, self.dev_trees_terminals, self.dev_trees_outcomes,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded,
self.dev_random_generators_playout, self.dev_trees_playout_outcomes)
cuda.synchronize()
t2_playout = time.time()
if self.verbose_debug:
print(f"[MCTSNC._playout_ocp() done; time: {t2_playout - t1_playout} s]")
self.time_playout += t2_playout - t1_playout
# backups
t1_backup = time.time()
bpg = self.n_trees
tpb = self.tpb_b2
if self.verbose_debug:
print(f"[MCTSNC._backup_ocp()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._backup_ocp[bpg, tpb](self.n_playouts,
self.dev_trees, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_nodes_selected, self.dev_trees_selected_paths, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes)
cuda.synchronize()
t2_backup = time.time()
if self.verbose_debug:
print(f"[MCTSNC._backup() done; time: {t2_backup - t1_backup} s]")
self.time_backup += t2_backup - t1_backup
self.steps += 1
self.time_loop = time.time() - t1_loop
# sum reduction over trees for each root action
t1_reduce_over_trees = time.time()
root_actions_expanded = np.empty_like(self.dev_root_actions_expanded)
self.dev_root_actions_expanded.copy_to_host(ary=root_actions_expanded)
n_root_actions = int(root_actions_expanded[-1])
bpg = n_root_actions
tpb = self.tpb_rot
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_trees_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reduce_over_trees_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_terminals, self.dev_trees_outcomes,
self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_root_actions_expanded, root_turn,
self.dev_root_ns, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins)
cuda.synchronize()
t2_reduce_over_trees = time.time()
self.time_reduce_over_trees = t2_reduce_over_trees - t1_reduce_over_trees
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_trees_thrifty() done; time: {self.time_reduce_over_trees} s]")
# max-argmax reduction over root actions
t1_reduce_over_actions = time.time()
bpg = 1
tpb = self.tpb_roa
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_actions_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reduce_over_actions_thrifty[bpg, tpb](n_root_actions,
self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins,
self.dev_best_action, self.dev_best_win_flag, self.dev_best_n, self.dev_best_n_wins)
self.best_action = self.dev_best_action.copy_to_host()[0]
self.best_win_flag = self.dev_best_win_flag.copy_to_host()[0]
self.best_n = self.dev_best_n.copy_to_host()[0]
self.best_n_wins = self.dev_best_n_wins.copy_to_host()[0]
self.best_q = self.best_n_wins / self.best_n if self.best_n > 0 else np.nan
cuda.synchronize()
self.best_action = root_actions_expanded[self.best_action]
t2_reduce_over_actions = time.time()
self.time_reduce_over_actions = t2_reduce_over_actions - t1_reduce_over_actions
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_actions_thrifty() done; time: {self.time_reduce_over_actions} s]")
t2 = time.time()
self.time_total = t2 - t1
if self.verbose_info:
print(f"[actions info:\n{dict_to_str(self._make_actions_info_thrifty())}]")
print(f"[performance info:\n{dict_to_str(self._make_performance_info())}]")
[docs]
def _run_ocp_prodigal(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf):
"""Runs computations for algorithmic variant: ``"ocp_prodigal"``."""
t1 = time.time()
# reset
t1_reset = time.time()
bpg = self.n_trees
tpb = self.tpb_r
dev_root_board = cuda.to_device(root_board)
if root_extra_info is None:
root_extra_info = np.zeros(1, dtype=np.int8) # fake extra info array
dev_root_extra_info = cuda.to_device(root_extra_info)
if self.verbose_debug:
print(f"[MCTSNC._reset()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reset[bpg, tpb](dev_root_board, dev_root_extra_info, root_turn,
self.dev_trees, self.dev_trees_sizes, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_boards, self.dev_trees_extra_infos)
cuda.synchronize()
t2_reset = time.time()
if self.verbose_debug:
print(f"[MCTSNC._reset() done; time: {t2_reset - t1_reset} s]")
self.time_select = 0.0
self.time_expand = 0.0
self.time_playout = 0.0
self.time_backup = 0.0
self.steps = 0
t1_loop = time.time()
while True:
t2_loop = time.time()
if forced_search_steps_limit < np.inf:
if self.steps >= forced_search_steps_limit:
break
elif self.steps >= self.search_steps_limit or t2_loop - t1_loop >= self.search_time_limit:
break
if self.verbose_debug:
print(f"[step: {self.steps + 1} starting, time used so far: {t2_loop - t1_loop} s]")
# selections
t1_select = time.time()
bpg = self.n_trees
tpb = self.tpb_s
if self.verbose_debug:
print(f"[MCTSNC._select()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._select[bpg, tpb](self.ucb_c,
self.dev_trees, self.dev_trees_leaves, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_nodes_selected, self.dev_trees_selected_paths)
cuda.synchronize()
t2_select = time.time()
if self.verbose_debug:
print(f"[MCTSNC._select() done; time: {t2_select - t1_select} s]")
self.time_select += t2_select - t1_select
# expansions
t1_expand = time.time()
t1_expand_1 = time.time()
bpg = self.n_trees
tpb = self.tpb_e1
if self.verbose_debug:
print(f"[MCTSNC._expand_1_ocp_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._expand_1_ocp_prodigal[bpg, tpb](self.max_tree_size,
self.dev_trees, self.dev_trees_sizes, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_random_generators_expand_1, self.dev_trees_actions_expanded)
cuda.synchronize()
if self.steps == 0:
MCTSNC._memorize_root_actions_expanded[1, self.state_max_actions + 2](self.dev_trees_actions_expanded, self.dev_root_actions_expanded)
cuda.synchronize()
t2_expand_1 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._expand_1_ocp_progial() done; time: {t2_expand_1 - t1_expand_1} s]")
t1_expand_2 = time.time()
bpg = (self.n_trees, self.state_max_actions) # prodigal number of blocks
tpb = self.tpb_e2
if self.verbose_debug:
print(f"[MCTSNC._expand_2_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._expand_2_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded)
cuda.synchronize()
t2_expand_2 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._expand_2_prodigal() done; time: {t2_expand_2 - t1_expand_2} s]")
t2_expand = time.time()
self.time_expand += t2_expand - t1_expand
# playouts
t1_playout = time.time()
bpg = self.n_trees
tpb = self.n_playouts
if self.verbose_debug:
print(f"[MCTSNC._playout_ocp()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._playout_ocp[bpg, tpb](self.dev_trees, self.dev_trees_turns, self.dev_trees_terminals, self.dev_trees_outcomes,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded,
self.dev_random_generators_playout, self.dev_trees_playout_outcomes)
cuda.synchronize()
t2_playout = time.time()
if self.verbose_debug:
print(f"[MCTSNC._playout_ocp() done; time: {t2_playout - t1_playout} s]")
self.time_playout += t2_playout - t1_playout
# backups
t1_backup = time.time()
bpg = self.n_trees
tpb = self.tpb_b2
if self.verbose_debug:
print(f"[MCTSNC._backup_ocp()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._backup_ocp[bpg, tpb](self.n_playouts,
self.dev_trees, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_nodes_selected, self.dev_trees_selected_paths, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes)
cuda.synchronize()
t2_backup = time.time()
if self.verbose_debug:
print(f"[MCTSNC._backup() done; time: {t2_backup - t1_backup} s]")
self.time_backup += t2_backup - t1_backup
self.steps += 1
self.time_loop = time.time() - t1_loop
# sum reduction over trees for each root action
t1_reduce_over_trees = time.time()
bpg = self.state_max_actions
tpb = self.tpb_rot
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_trees_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reduce_over_trees_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_terminals, self.dev_trees_outcomes,
self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_root_actions_expanded, root_turn,
self.dev_root_ns, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins)
cuda.synchronize()
t2_reduce_over_trees = time.time()
self.time_reduce_over_trees = t2_reduce_over_trees - t1_reduce_over_trees
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_trees_prodigal() done; time: {self.time_reduce_over_trees} s]")
# max-argmax reduction over root actions
t1_reduce_over_actions = time.time()
bpg = 1
tpb = self.tpb_roa
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_actions_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reduce_over_actions_prodigal[bpg, tpb](self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins,
self.dev_best_action, self.dev_best_win_flag, self.dev_best_n, self.dev_best_n_wins)
self.best_action = self.dev_best_action.copy_to_host()[0]
self.best_win_flag = self.dev_best_win_flag.copy_to_host()[0]
self.best_n = self.dev_best_n.copy_to_host()[0]
self.best_n_wins = self.dev_best_n_wins.copy_to_host()[0]
self.best_q = self.best_n_wins / self.best_n if self.best_n > 0 else np.nan
cuda.synchronize()
t2_reduce_over_actions = time.time()
self.time_reduce_over_actions = t2_reduce_over_actions - t1_reduce_over_actions
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_actions_prodigal() done; time: {self.time_reduce_over_actions} s]")
t2 = time.time()
self.time_total = t2 - t1
if self.verbose_info:
print(f"[actions info:\n{dict_to_str(self._make_actions_info_prodigal())}]")
print(f"[performance info:\n{dict_to_str(self._make_performance_info())}]")
[docs]
def _run_acp_thrifty(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf):
"""Runs computations for algorithmic variant: ``"acp_thrifty"``."""
t1 = time.time()
# reset
t1_reset = time.time()
bpg = self.n_trees
tpb = self.tpb_r
dev_root_board = cuda.to_device(root_board)
if root_extra_info is None:
root_extra_info = np.zeros(1, dtype=np.int8) # fake extra info array
dev_root_extra_info = cuda.to_device(root_extra_info)
if self.verbose_debug:
print(f"[MCTSNC._reset()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reset[bpg, tpb](dev_root_board, dev_root_extra_info, root_turn,
self.dev_trees, self.dev_trees_sizes, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_boards, self.dev_trees_extra_infos)
cuda.synchronize()
t2_reset = time.time()
if self.verbose_debug:
print(f"[MCTSNC._reset() done; time: {t2_reset - t1_reset} s]")
self.time_select = 0.0
self.time_expand = 0.0
self.time_playout = 0.0
self.time_backup = 0.0
self.steps = 0
trees_actions_expanded = np.empty((self.n_trees, self.state_max_actions + 2), dtype=np.int16)
t1_loop = time.time()
while True:
t2_loop = time.time()
if forced_search_steps_limit < np.inf:
if self.steps >= forced_search_steps_limit:
break
elif self.steps >= self.search_steps_limit or t2_loop - t1_loop >= self.search_time_limit:
break
if self.verbose_debug:
print(f"[step: {self.steps + 1} starting, time used so far: {t2_loop - t1_loop} s]")
# selections
t1_select = time.time()
bpg = self.n_trees
tpb = self.tpb_s
if self.verbose_debug:
print(f"[MCTSNC._select()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._select[bpg, tpb](self.ucb_c,
self.dev_trees, self.dev_trees_leaves, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_nodes_selected, self.dev_trees_selected_paths)
cuda.synchronize()
t2_select = time.time()
if self.verbose_debug:
print(f"[MCTSNC._select() done; time: {t2_select - t1_select} s]")
self.time_select += t2_select - t1_select
# expansions
t1_expand = time.time()
t1_expand_1 = time.time()
bpg = self.n_trees
tpb = self.tpb_e1
if self.verbose_debug:
print(f"[MCTSNC._expand_1_acp_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._expand_1_acp_thrifty[bpg, tpb](self.max_tree_size,
self.dev_trees, self.dev_trees_sizes, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded)
self.dev_trees_actions_expanded.copy_to_host(ary=trees_actions_expanded)
cuda.synchronize()
if self.steps == 0:
MCTSNC._memorize_root_actions_expanded[1, self.state_max_actions + 2](self.dev_trees_actions_expanded, self.dev_root_actions_expanded)
cuda.synchronize()
t2_expand_1 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._expand_1_acp_thrifty() done; time: {t2_expand_1 - t1_expand_1} s]")
t1_expand_2 = time.time()
trees_actions_expanded_flat = self._flatten_trees_actions_expanded_thrifty(trees_actions_expanded)
bpg = trees_actions_expanded_flat.shape[0] # thrifty number of blocks
tpb = self.tpb_e2
if self.verbose_debug:
print(f"[MCTSNC._expand_2_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
dev_trees_actions_expanded_flat = cuda.to_device(trees_actions_expanded_flat)
MCTSNC._expand_2_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, dev_trees_actions_expanded_flat)
cuda.synchronize()
t2_expand_2 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._expand_2_thrifty() done; time: {t2_expand_2 - t1_expand_2} s]")
t2_expand = time.time()
self.time_expand += t2_expand - t1_expand
# playouts
t1_playout = time.time()
bpg = trees_actions_expanded_flat.shape[0] # thrifty number of blocks
tpb = self.n_playouts
if self.verbose_debug:
print(f"[MCTSNC._playout_acp_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._playout_acp_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_turns, self.dev_trees_terminals, self.dev_trees_outcomes,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded, dev_trees_actions_expanded_flat,
self.dev_random_generators_playout, self.dev_trees_playout_outcomes, self.dev_trees_playout_outcomes_children)
cuda.synchronize()
t2_playout = time.time()
if self.verbose_debug:
print(f"[MCTSNC._playout_acp_thrifty() done; time: {t2_playout - t1_playout} s]")
self.time_playout += t2_playout - t1_playout
# backups
t1_backup = time.time()
t1_backup_1 = time.time()
bpg = self.n_trees
tpb = self.tpb_b1
if self.verbose_debug:
print(f"[MCTSNC._backup_acp_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._backup_1_acp_thrifty[bpg, tpb](self.n_playouts,
self.dev_trees, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded, self.dev_trees_playout_outcomes, self.dev_trees_playout_outcomes_children)
cuda.synchronize()
t2_backup_1 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._backup_1_acp_thrifty() done; time: {t2_backup_1 - t1_backup_1} s]")
t1_backup_2 = time.time()
bpg = self.n_trees
tpb = self.tpb_b2
if self.verbose_debug:
print(f"[MCTSNC._backup_2_acp()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._backup_2_acp[bpg, tpb](self.n_playouts,
self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_selected_paths, self.dev_trees_actions_expanded,
self.dev_trees_playout_outcomes)
cuda.synchronize()
t2_backup_2 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._backup_2_acp() done; time: {t2_backup_2 - t1_backup_2} s]")
t2_backup = time.time()
self.time_backup += t2_backup - t1_backup
self.steps += 1
self.time_loop = time.time() - t1_loop
# sum reduction over trees for each root action
t1_reduce_over_trees = time.time()
root_actions_expanded = np.empty_like(self.dev_root_actions_expanded)
self.dev_root_actions_expanded.copy_to_host(ary=root_actions_expanded)
n_root_actions = int(root_actions_expanded[-1])
bpg = n_root_actions
tpb = self.tpb_rot
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_trees_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reduce_over_trees_thrifty[bpg, tpb](self.dev_trees, self.dev_trees_terminals, self.dev_trees_outcomes,
self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_root_actions_expanded, root_turn,
self.dev_root_ns, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins)
cuda.synchronize()
t2_reduce_over_trees = time.time()
self.time_reduce_over_trees = t2_reduce_over_trees - t1_reduce_over_trees
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_trees_thrifty() done; time: {self.time_reduce_over_trees} s]")
# max-argmax reduction over root actions
t1_reduce_over_actions = time.time()
bpg = 1
tpb = self.tpb_roa
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_actions_thrifty()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reduce_over_actions_thrifty[bpg, tpb](n_root_actions,
self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins,
self.dev_best_action, self.dev_best_win_flag, self.dev_best_n, self.dev_best_n_wins)
self.best_action = self.dev_best_action.copy_to_host()[0]
self.best_win_flag = self.dev_best_win_flag.copy_to_host()[0]
self.best_n = self.dev_best_n.copy_to_host()[0]
self.best_n_wins = self.dev_best_n_wins.copy_to_host()[0]
self.best_q = self.best_n_wins / self.best_n if self.best_n > 0 else np.nan
cuda.synchronize()
self.best_action = root_actions_expanded[self.best_action]
t2_reduce_over_actions = time.time()
self.time_reduce_over_actions = t2_reduce_over_actions - t1_reduce_over_actions
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_actions_thrifty() done; time: {self.time_reduce_over_actions} s]")
t2 = time.time()
self.time_total = t2 - t1
if self.verbose_info:
print(f"[actions info:\n{dict_to_str(self._make_actions_info_thrifty())}]")
print(f"[performance info:\n{dict_to_str(self._make_performance_info())}]")
[docs]
def _run_acp_prodigal(self, root_board, root_extra_info, root_turn, forced_search_steps_limit=np.inf):
"""Runs computations for algorithmic variant: ``"acp_prodigal"``."""
t1 = time.time()
# reset
t1_reset = time.time()
bpg = self.n_trees
tpb = self.tpb_r
dev_root_board = cuda.to_device(root_board)
if root_extra_info is None:
root_extra_info = np.zeros(1, dtype=np.int8) # fake extra info array
dev_root_extra_info = cuda.to_device(root_extra_info)
if self.verbose_debug:
print(f"[MCTSNC._reset()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reset[bpg, tpb](dev_root_board, dev_root_extra_info, root_turn,
self.dev_trees, self.dev_trees_sizes, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_boards, self.dev_trees_extra_infos)
cuda.synchronize()
t2_reset = time.time()
if self.verbose_debug:
print(f"[MCTSNC._reset() done; time: {t2_reset - t1_reset} s]")
self.time_select = 0.0
self.time_expand = 0.0
self.time_playout = 0.0
self.time_backup = 0.0
self.steps = 0
t1_loop = time.time()
while True:
t2_loop = time.time()
if forced_search_steps_limit < np.inf:
if self.steps >= forced_search_steps_limit:
break
elif self.steps >= self.search_steps_limit or t2_loop - t1_loop >= self.search_time_limit:
break
if self.verbose_debug:
print(f"[step: {self.steps + 1} starting, time used so far: {t2_loop - t1_loop} s]")
# selections
t1_select = time.time()
bpg = self.n_trees
tpb = self.tpb_s
if self.verbose_debug:
print(f"[MCTSNC._select()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._select[bpg, tpb](self.ucb_c,
self.dev_trees, self.dev_trees_leaves, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_nodes_selected, self.dev_trees_selected_paths)
cuda.synchronize()
t2_select = time.time()
if self.verbose_debug:
print(f"[MCTSNC._select() done; time: {t2_select - t1_select} s]")
self.time_select += t2_select - t1_select
# expansions
t1_expand = time.time()
t1_expand_1 = time.time()
bpg = self.n_trees
tpb = self.tpb_e1
if self.verbose_debug:
print(f"[MCTSNC._expand_1_acp_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._expand_1_acp_prodigal[bpg, tpb](self.max_tree_size,
self.dev_trees, self.dev_trees_sizes, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded)
cuda.synchronize()
if self.steps == 0:
MCTSNC._memorize_root_actions_expanded[1, self.state_max_actions + 2](self.dev_trees_actions_expanded, self.dev_root_actions_expanded)
cuda.synchronize()
t2_expand_1 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._expand_1_acp_prodigal() done; time: {t2_expand_1 - t1_expand_1} s]")
t1_expand_2 = time.time()
bpg = (self.n_trees, self.state_max_actions) # prodigal number of blocks
tpb = self.tpb_e2
if self.verbose_debug:
print(f"[MCTSNC._expand_2_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._expand_2_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_depths, self.dev_trees_turns, self.dev_trees_leaves, self.dev_trees_terminals, self.dev_trees_outcomes, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded)
cuda.synchronize()
t2_expand_2 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._expand_2_prodigal() done; time: {t2_expand_2 - t1_expand_2} s]")
t2_expand = time.time()
self.time_expand += t2_expand - t1_expand
# playouts
t1_playout = time.time()
bpg = (self.n_trees, self.state_max_actions) # prodigal number of blocks
tpb = self.n_playouts
if self.verbose_debug:
print(f"[MCTSNC._playout_acp_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._playout_acp_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_turns, self.dev_trees_terminals, self.dev_trees_outcomes,
self.dev_trees_boards, self.dev_trees_extra_infos,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded,
self.dev_random_generators_playout, self.dev_trees_playout_outcomes, self.dev_trees_playout_outcomes_children)
cuda.synchronize()
t2_playout = time.time()
if self.verbose_debug:
print(f"[MCTSNC._playout_acp_prodigal() done; time: {t2_playout - t1_playout} s]")
self.time_playout += t2_playout - t1_playout
# backups
t1_backup = time.time()
t1_backup_1 = time.time()
bpg = self.n_trees
tpb = self.tpb_b1
if self.verbose_debug:
print(f"[MCTSNC._backup_1_acp_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._backup_1_acp_prodigal[bpg, tpb](self.n_playouts,
self.dev_trees, self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_nodes_selected, self.dev_trees_actions_expanded,
self.dev_trees_playout_outcomes, self.dev_trees_playout_outcomes_children)
cuda.synchronize()
t2_backup_1 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._backup_1_acp_prodigal() done; time: {t2_backup_1 - t1_backup_1} s]")
t1_backup_2 = time.time()
bpg = self.n_trees
tpb = self.tpb_b2
if self.verbose_debug:
print(f"[MCTSNC._backup_2_acp()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._backup_2_acp[bpg, tpb](self.n_playouts,
self.dev_trees_turns, self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_trees_selected_paths, self.dev_trees_actions_expanded,
self.dev_trees_playout_outcomes)
cuda.synchronize()
t2_backup_2 = time.time()
if self.verbose_debug:
print(f"[MCTSNC._backup_2_acp() done; time: {t2_backup_2 - t1_backup_2} s]")
t2_backup = time.time()
self.time_backup += t2_backup - t1_backup
self.steps += 1
self.time_loop = time.time() - t1_loop
# sum reduction over trees
t1_reduce_over_trees = time.time()
bpg = self.state_max_actions
tpb = self.tpb_rot
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_trees_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reduce_over_trees_prodigal[bpg, tpb](self.dev_trees, self.dev_trees_terminals, self.dev_trees_outcomes,
self.dev_trees_ns, self.dev_trees_ns_wins,
self.dev_root_actions_expanded, root_turn,
self.dev_root_ns, self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins)
cuda.synchronize()
t2_reduce_over_trees = time.time()
self.time_reduce_over_trees = t2_reduce_over_trees - t1_reduce_over_trees
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_trees_prodigal() done; time: {self.time_reduce_over_trees} s]")
# max-argmax reduction over root actions
t1_reduce_over_actions = time.time()
bpg = 1
tpb = self.tpb_roa
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_actions_prodigal()...; bpg: {bpg}, tpb: {tpb}]")
MCTSNC._reduce_over_actions_prodigal[bpg, tpb](self.dev_actions_win_flags, self.dev_actions_ns, self.dev_actions_ns_wins,
self.dev_best_action, self.dev_best_win_flag, self.dev_best_n, self.dev_best_n_wins)
self.best_action = self.dev_best_action.copy_to_host()[0]
self.best_win_flag = self.dev_best_win_flag.copy_to_host()[0]
self.best_n = self.dev_best_n.copy_to_host()[0]
self.best_n_wins = self.dev_best_n_wins.copy_to_host()[0]
self.best_q = self.best_n_wins / self.best_n if self.best_n > 0 else np.nan
cuda.synchronize()
t2_reduce_over_actions = time.time()
self.time_reduce_over_actions = t2_reduce_over_actions - t1_reduce_over_actions
if self.verbose_debug:
print(f"[MCTSNC._reduce_over_actions_prodigal() done; time: {self.time_reduce_over_actions} s]")
t2 = time.time()
self.time_total = t2 - t1
if self.verbose_info:
print(f"[actions info:\n{dict_to_str(self._make_actions_info_prodigal())}]")
print(f"[performance info:\n{dict_to_str(self._make_performance_info())}]")
[docs]
@staticmethod
@cuda.jit(void(int8[:, :], int8[:], int8, int32[:, :, :], int32[:], int16[:, :], int8[:, :], boolean[:, :], boolean[:, :], int32[:, :], int32[:, :], int8[:, :, :, :], int8[:, :, :]))
def _reset(root_board, root_extra_info, root_turn, trees, trees_sizes, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos):
"""CUDA kernel responsible for reseting root nodes of trees to new root state."""
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
if t == 0:
trees[ti, 0, 0] = int32(-1)
trees_sizes[ti] = int32(1)
trees_depths[ti, 0] = int32(0)
trees_turns[ti, 0] = int8(root_turn)
trees_leaves[ti, 0] = True
trees_terminals[ti, 0] = False
trees_ns[ti, 0] = int32(0)
trees_ns_wins[ti, 0] = int32(0)
m, n = root_board.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
trees_boards[ti, 0, i, j] = root_board[i, j]
e += tpb
extra_info_memory = root_extra_info.size
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
trees_extra_infos[ti, 0, e] = root_extra_info[e]
[docs]
@staticmethod
@cuda.jit(void(float32, int32[:, :, :], boolean[:, :], int32[:, :], int32[:, :], int32[:], int32[:, :]))
def _select(ucb_c, trees, trees_leaves, trees_ns, trees_ns_wins, trees_nodes_selected, trees_selected_paths):
"""CUDA kernel responsible for computations of stage: selections."""
shared_ucbs = cuda.shared.array(512, dtype=float32) # 512 - assumed limit on max actions
shared_best_child = cuda.shared.array(512, dtype=int32) # 512 - assumed limit on max actions (array instead of one index due to max-argmax reduction pattern)
shared_selected_path = cuda.shared.array(2048 + 2, dtype=int32) # 2048 - assumed equal to MAX_TREE_DEPTH
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
state_max_actions = int16(trees.shape[2] - 1)
node = int32(0)
depth = int16(0)
if t == 0:
shared_selected_path[0] = int32(0) # path always starting from root
while not trees_leaves[ti, node]:
if t < state_max_actions:
child = trees[ti, node, 1 + t]
shared_best_child[t] = child
if child == int32(-1):
shared_ucbs[t] = -float32(inf)
else:
child_n = trees_ns[ti, child]
if child_n == int32(0):
shared_ucbs[t] = float32(inf)
else:
shared_ucbs[t] = trees_ns_wins[ti, child] / float32(child_n) + ucb_c * math.sqrt(math.log(trees_ns[ti, node]) / child_n)
else:
shared_ucbs[t] = -float32(inf)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # max-argmax reduction pattern
if t < stride:
t_stride = t + stride
if shared_ucbs[t] < shared_ucbs[t_stride]:
shared_ucbs[t] = shared_ucbs[t_stride]
shared_best_child[t] = shared_best_child[t_stride]
cuda.syncthreads()
stride >>= 1
node = shared_best_child[0]
depth += int16(1)
if t == 0:
shared_selected_path[depth] = node
path_length = depth + 1
pept = (path_length + tpb - 1) // tpb # path elements per thread
e = t
for _ in range(pept):
if e < path_length:
trees_selected_paths[ti, e] = shared_selected_path[e]
e += tpb
if t == 0:
trees_nodes_selected[ti] = node
trees_selected_paths[ti, -1] = path_length
[docs]
@staticmethod
@cuda.jit(void(int32, int32[:, :, :], int32[:], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], xoroshiro128p_type[:], int16[:, :]))
def _expand_1_ocp_thrifty(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos,
trees_nodes_selected, random_generators_expand_1, trees_actions_expanded):
"""CUDA kernel responsible for computations of stage: expansions (substage 1, variant ``"ocp_thrifty"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
shared_legal_actions = cuda.shared.array(512, dtype=boolean) # 512 - assumed limit on max actions
shared_legal_actions_child_shifts = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
t_global = cuda.grid(1)
state_max_actions = int16(trees.shape[2] - 1)
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
selected = trees_nodes_selected[ti] # node selected
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, selected, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, selected, e]
e += tpb
cuda.syncthreads()
selected_is_terminal = trees_terminals[ti, selected]
if selected_is_terminal:
shared_legal_actions[t] = False
elif t < state_max_actions:
is_action_legal(m, n, shared_board, shared_extra_info, trees_turns[ti, selected], t, shared_legal_actions)
cuda.syncthreads()
size_so_far = trees_sizes[ti]
child_shift = int16(-1)
rand_child_for_playout = int16(-3) # remains like this when tree cannot grow due to memory exhausted
if t < state_max_actions:
shared_legal_actions_child_shifts[t] = int16(-1)
if t == 0:
if not selected_is_terminal:
for i in range(state_max_actions):
if shared_legal_actions[i] and size_so_far + child_shift + 1 < max_tree_size:
child_shift += 1
shared_legal_actions_child_shifts[i] = child_shift
if child_shift >= int16(0):
trees_actions_expanded[ti, -1] = child_shift + 1 # information how many children expanded (as last entry)
trees_leaves[ti, selected] = False
rand_child_for_playout = int16(xoroshiro128p_uniform_float32(random_generators_expand_1, t_global) * (child_shift + 1))
else:
trees_actions_expanded[ti, -1] = int16(1) # tree not grown due to memory exhausted, but selected shall be played out (hence 1 needed)
trees_actions_expanded[ti, -2] = rand_child_for_playout
else:
trees_actions_expanded[ti, -1] = int16(1) # terminal in fact not expanded, but shall be played out (hence 1 needed)
trees_actions_expanded[ti, -2] = int16(-1) # fake child for playouts indicating that selected is terminal (and playouts computed from outcome)
cuda.syncthreads()
if t < state_max_actions:
child_index = int32(-1)
if shared_legal_actions[t]:
child_shift = shared_legal_actions_child_shifts[t]
if child_shift >= int16(0):
child_index = size_so_far + child_shift
trees_actions_expanded[ti, child_shift] = t # for thrifty variants
trees[ti, selected, 1 + t] = child_index # parent gets to know where child is
if t == 0:
trees_sizes[ti] += shared_legal_actions_child_shifts[state_max_actions - 1] + 1 # updating tree size
[docs]
@staticmethod
@cuda.jit(void(int32, int32[:, :, :], int32[:], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], xoroshiro128p_type[:], int16[:, :]))
def _expand_1_ocp_prodigal(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos,
trees_nodes_selected, random_generators_expand_1, trees_actions_expanded):
"""CUDA kernel responsible for computations of stage: expansions (substage 1, variant ``"ocp_prodigal"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
shared_legal_actions = cuda.shared.array(512, dtype=boolean) # 512 - assumed limit on max actions
shared_legal_actions_child_shifts = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions
shared_map_child_shifts_to_action = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
t_global = cuda.grid(1)
state_max_actions = int16(trees.shape[2] - 1)
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
selected = trees_nodes_selected[ti] # node selected
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, selected, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, selected, e]
e += tpb
cuda.syncthreads()
selected_is_terminal = trees_terminals[ti, selected]
if selected_is_terminal:
shared_legal_actions[t] = False
elif t < state_max_actions:
is_action_legal(m, n, shared_board, shared_extra_info, trees_turns[ti, selected], t, shared_legal_actions)
cuda.syncthreads()
size_so_far = trees_sizes[ti]
child_shift = int16(-1)
rand_child_for_playout = int16(-3) # remains like this when tree cannot grow due to memory exhausted
if t < state_max_actions:
shared_legal_actions_child_shifts[t] = int16(-1)
if t == 0:
if not selected_is_terminal:
for i in range(state_max_actions):
if shared_legal_actions[i] and size_so_far + child_shift + 1 < max_tree_size:
child_shift += 1
shared_map_child_shifts_to_action[child_shift] = i
shared_legal_actions_child_shifts[i] = child_shift
if child_shift >= int16(0):
trees_actions_expanded[ti, -1] = child_shift + 1 # information how many children expanded (as last entry)
trees_leaves[ti, selected] = False
rand_child_for_playout = int16(xoroshiro128p_uniform_float32(random_generators_expand_1, t_global) * (child_shift + 1))
rand_child_for_playout = shared_map_child_shifts_to_action[rand_child_for_playout]
else:
trees_actions_expanded[ti, -1] = int16(1) # tree not grown due to memory exhausted, but selected shall be played out (hence 1 needed)
trees_actions_expanded[ti, -2] = rand_child_for_playout
else:
trees_actions_expanded[ti, -1] = int16(1)
trees_actions_expanded[ti, -2] = int16(-1) # fake child for playouts indicating that selected is terminal (and playouts computed from outcome)
cuda.syncthreads()
if t < state_max_actions:
child_index = int32(-1)
if shared_legal_actions[t]:
child_shift = shared_legal_actions_child_shifts[t]
if child_shift >= int16(0):
child_index = size_so_far + child_shift
trees_actions_expanded[ti, t] = t # for prodigal variants
else:
trees_actions_expanded[ti, t] = int16(-1) # for prodigal variants
trees[ti, selected, 1 + t] = child_index # parent gets to know where child is
if t == 0:
trees_sizes[ti] += shared_legal_actions_child_shifts[state_max_actions - 1] + 1 # updating tree size
[docs]
@staticmethod
@cuda.jit(void(int32, int32[:, :, :], int32[:], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :]))
def _expand_1_acp_thrifty(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos,
trees_nodes_selected, trees_actions_expanded):
"""CUDA kernel responsible for computations of stage: expansions (substage 1, variant ``"acp_thrifty"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
shared_legal_actions = cuda.shared.array(512, dtype=boolean) # 512 - assumed limit on max actions
shared_legal_actions_child_shifts = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
state_max_actions = int16(trees.shape[2] - 1)
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
selected = trees_nodes_selected[ti] # node selected
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, selected, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, selected, e]
e += tpb
cuda.syncthreads()
selected_is_terminal = trees_terminals[ti, selected]
if selected_is_terminal:
shared_legal_actions[t] = False
elif t < state_max_actions:
is_action_legal(m, n, shared_board, shared_extra_info, trees_turns[ti, selected], t, shared_legal_actions)
cuda.syncthreads()
size_so_far = trees_sizes[ti]
child_shift = int16(-1)
fake_child_for_playout = int16(-3) # remains like this when tree cannot grow due to memory exhausted
if t < state_max_actions:
shared_legal_actions_child_shifts[t] = int16(-1)
if t == 0:
if not selected_is_terminal:
for i in range(state_max_actions):
if shared_legal_actions[i] and size_so_far + child_shift + 1 < max_tree_size:
child_shift += 1
shared_legal_actions_child_shifts[i] = child_shift
if child_shift >= int16(0):
trees_actions_expanded[ti, -1] = child_shift + 1 # information how many children expanded (as last entry)
trees_leaves[ti, selected] = False
fake_child_for_playout = int16(-2) # indicates all children for playouts (acp)
else:
trees_actions_expanded[ti, -1] = int16(1) # tree not grown due to memory exhausted, but selected shall be played out (hence 1 needed)
trees_actions_expanded[ti, -2] = fake_child_for_playout
else:
trees_actions_expanded[ti, -1] = int16(1)
trees_actions_expanded[ti, -2] = int16(-1) # fake child for playouts indicating that selected is terminal (and playouts computed from outcome)
cuda.syncthreads()
if t < state_max_actions:
child_index = int32(-1)
if shared_legal_actions[t]:
child_shift = shared_legal_actions_child_shifts[t]
if child_shift >= int16(0):
child_index = size_so_far + child_shift
trees_actions_expanded[ti, child_shift] = t # for thrifty variants
trees[ti, selected, 1 + t] = child_index # parent gets to know where child is
if t == 0:
trees_sizes[ti] += shared_legal_actions_child_shifts[state_max_actions - 1] + 1 # updating tree size
if selected_is_terminal or fake_child_for_playout == int16(-3):
trees_actions_expanded[ti, 0] = int16(0) # fake legal action for playout (so that exactly one block becomes executed in full body)
[docs]
@staticmethod
@cuda.jit(void(int32, int32[:, :, :], int32[:], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :]))
def _expand_1_acp_prodigal(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos,
trees_nodes_selected, trees_actions_expanded):
"""CUDA kernel responsible for computations of stage: expansions (substage 1, variant ``"acp_prodigal"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
shared_legal_actions = cuda.shared.array(512, dtype=boolean) # 512 - assumed limit on max actions
shared_legal_actions_child_shifts = cuda.shared.array(512, dtype=int16) # 512 - assumed limit on max actions
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
state_max_actions = int16(trees.shape[2] - 1)
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
selected = trees_nodes_selected[ti] # node selected
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, selected, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, selected, e]
e += tpb
cuda.syncthreads()
selected_is_terminal = trees_terminals[ti, selected]
if selected_is_terminal:
shared_legal_actions[t] = False
elif t < state_max_actions:
is_action_legal(m, n, shared_board, shared_extra_info, trees_turns[ti, selected], t, shared_legal_actions)
cuda.syncthreads()
size_so_far = trees_sizes[ti]
child_shift = int16(-1)
fake_child_for_playout = int16(-3) # remains like this when tree cannot grow due to memory exhausted
if t < state_max_actions:
shared_legal_actions_child_shifts[t] = int16(-1)
if t == 0:
if not selected_is_terminal:
for i in range(state_max_actions):
if shared_legal_actions[i] and size_so_far + child_shift + 1 < max_tree_size:
child_shift += 1
shared_legal_actions_child_shifts[i] = child_shift
if child_shift >= int16(0):
trees_actions_expanded[ti, -1] = child_shift + 1 # information how many children expanded (as last entry)
trees_leaves[ti, selected] = False
fake_child_for_playout = int16(-2) # indicates all children for playouts (acp)
else:
trees_actions_expanded[ti, -1] = int16(1) # tree not grown due to memory exhausted, but selected shall be played out (hence 1 needed)
trees_actions_expanded[ti, -2] = fake_child_for_playout
else:
trees_actions_expanded[ti, -1] = int16(1)
trees_actions_expanded[ti, -2] = int16(-1) # fake child for playouts indicating that selected is terminal (and playouts computed from outcome)
cuda.syncthreads()
if t < state_max_actions:
child_index = int32(-1)
if shared_legal_actions[t]:
child_shift = shared_legal_actions_child_shifts[t]
if child_shift >= int16(0):
child_index = size_so_far + child_shift
trees_actions_expanded[ti, t] = t # for prodigal variants
else:
trees_actions_expanded[ti, t] = int16(-1) # tree not grown case
else:
trees_actions_expanded[ti, t] = int16(-1) # for prodigal variants
trees[ti, selected, 1 + t] = child_index # parent gets to know where child is
if t == 0:
trees_sizes[ti] += shared_legal_actions_child_shifts[state_max_actions - 1] + 1 # updating tree size
if selected_is_terminal or fake_child_for_playout == int16(-3):
trees_actions_expanded[ti, 0] = int16(0) # fake legal action for playout (so that exactly one block becomes executed in full body)
[docs]
@staticmethod
@cuda.jit(void(int16[:, :], int16[:]))
def _memorize_root_actions_expanded(dev_trees_actions_expanded, dev_root_actions_expanded):
"""CUDA kernel responsible for memorizing actions expanded at root node(s)."""
t = cuda.threadIdx.x
dev_root_actions_expanded[t] = dev_trees_actions_expanded[0, t]
[docs]
@staticmethod
@cuda.jit(void(int32[:, :, :], int16[:, :], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :], int32[:, :], int32[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :]))
def _expand_2_thrifty(trees, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded_flat):
"""CUDA kernel responsible for computations of stage: expansions (substage 2, thrifty number of blocks - variant ``"ocp_thrifty"`` or ``"acp_thrifty"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
tai = cuda.blockIdx.x # tree-action pair index
ti = trees_actions_expanded_flat[tai, 0]
action = trees_actions_expanded_flat[tai, 1]
if action < int16(0):
return # selected is terminal or tree not grown due to memory exhausted
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
selected = trees_nodes_selected[ti]
if trees_terminals[ti, selected]:
return
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, selected, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, selected, e]
e += tpb
cuda.syncthreads()
turn = 0
if t == 0:
turn = trees_turns[ti, selected]
take_action(m, n, shared_board, shared_extra_info, turn, action)
cuda.syncthreads()
child = trees[ti, selected, 1 + action]
e = t
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
trees_boards[ti, child, i, j] = shared_board[i, j]
e += tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
trees_extra_infos[ti, child, e] = shared_extra_info[e]
e += tpb
if t == 0:
trees[ti, child, 0] = selected
trees_turns[ti, child] = -turn
trees_leaves[ti, child] = True
terminal_flag = False
outcome = compute_outcome(m, n, shared_board, shared_extra_info, -turn, action)
if outcome == int8(-1) or outcome == int8(0) or outcome == int8(1):
terminal_flag = True
trees_terminals[ti, child] = terminal_flag
trees_outcomes[ti, child] = outcome
trees_ns[ti, child] = int32(0)
trees_ns_wins[ti, child] = int32(0)
trees_depths[ti, child] = trees_depths[ti, selected] + 1
[docs]
@staticmethod
@cuda.jit(void(int32[:, :, :], int16[:, :], int8[:, :], boolean[:, :], boolean[:, :], int8[:, :], int32[:, :], int32[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :]))
def _expand_2_prodigal(trees, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded):
"""CUDA kernel responsible for computations of stage: expansions (substage 2, prodigal number of blocks - variant ``"ocp_prodigal"`` or ``"acp_prodigal"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
ti = cuda.blockIdx.x
action = cuda.blockIdx.y
if trees_actions_expanded[ti, action] < int16(0):
return # prodigality
if trees_actions_expanded[ti, -2] == int16(-1) or trees_actions_expanded[ti, -2] == int16(-3):
return # selected is terminal or tree cannot grow due to memory exhausted
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
selected = trees_nodes_selected[ti]
if trees_terminals[ti, selected]:
return
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, selected, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, selected, e]
e += tpb
cuda.syncthreads()
turn = int8(0)
if t == 0:
turn = trees_turns[ti, selected]
take_action(m, n, shared_board, shared_extra_info, turn, action)
cuda.syncthreads()
child = trees[ti, selected, 1 + action]
e = t
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
trees_boards[ti, child, i, j] = shared_board[i, j]
e += tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
trees_extra_infos[ti, child, e] = shared_extra_info[e]
e += tpb
if t == 0:
trees[ti, child, 0] = selected
trees_turns[ti, child] = -turn
trees_leaves[ti, child] = True
terminal_flag = False
outcome = compute_outcome(m, n, shared_board, shared_extra_info, -turn, action)
if outcome == int8(-1) or outcome == int8(0) or outcome == int8(1):
terminal_flag = True
trees_terminals[ti, child] = terminal_flag
trees_outcomes[ti, child] = outcome
trees_ns[ti, child] = int32(0)
trees_ns_wins[ti, child] = int32(0)
trees_depths[ti, child] = trees_depths[ti, selected] + 1
[docs]
@staticmethod
@cuda.jit(void(int32[:, :, :], int8[:, :], boolean[:, :], int8[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :], xoroshiro128p_type[:], int32[:, :]))
def _playout_ocp(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, random_generators_playout, trees_playout_outcomes):
"""CUDA kernel responsible for computations of stage: playouts (variant ``"ocp_thrifty"`` or ``"ocp_prodigal"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
shared_playout_outcomes = cuda.shared.array((512, 2), dtype=int16) # 512 - assumed max tpb for playouts, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout
local_board = cuda.local.array((32, 32), dtype=int8)
local_extra_info = cuda.local.array(4096, dtype=int8)
local_legal_actions_with_count = cuda.local.array(512 + 1, dtype=int16) # 512 - assumed limit on max actions
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
to_be_played_out = trees_nodes_selected[ti] # temporarily to_be_played_out equals selected
rand_child_for_playout = trees_actions_expanded[ti, -2]
last_action = int16(-1) # none yet
if rand_child_for_playout >= int16(0): # check if some child picked on random for playouts
last_action = trees_actions_expanded[ti, rand_child_for_playout]
to_be_played_out = trees[ti, to_be_played_out, 1 + last_action]
if trees_terminals[ti, to_be_played_out]: # root for playouts has been discovered terminal before (by game rules) -> taking stored outcome ("multiplied" by tpb)
if t == 0:
outcome = trees_outcomes[ti, to_be_played_out]
trees_playout_outcomes[ti, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1
trees_playout_outcomes[ti, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1
else:
t = cuda.threadIdx.x
t_global = cuda.grid(1)
shared_playout_outcomes[t, 0] = np.int16(0)
shared_playout_outcomes[t, 1] = np.int16(0)
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, to_be_played_out, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, to_be_played_out, e]
e += tpb
cuda.syncthreads()
for i in range(m):
for j in range(n):
local_board[i, j] = shared_board[i, j]
for i in range(extra_info_memory):
local_extra_info[i] = shared_extra_info[i]
local_legal_actions_with_count[-1] = 0
turn = trees_turns[ti, to_be_played_out]
outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) if last_action != int16(-1) else int8(2) # else case only when trees not grown due to memory limit (then selected played out)
while True: # playout loop
if not (outcome == int8(-1) or outcome == int8(0) or outcome == int8(1)): # indecisive, game ongoing
legal_actions_playout(m, n, local_board, local_extra_info, turn, local_legal_actions_with_count)
count = local_legal_actions_with_count[-1]
action_ord = int16(xoroshiro128p_uniform_float32(random_generators_playout, t_global) * count)
last_action = local_legal_actions_with_count[action_ord]
take_action_playout(m, n, local_board, local_extra_info, turn, last_action, action_ord, local_legal_actions_with_count)
turn = -turn
else:
if outcome != int8(0):
shared_playout_outcomes[t, (outcome + 1) // 2] = int8(1)
break
outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # sum reduction pattern
if t < stride:
t_stride = t + stride
shared_playout_outcomes[t, 0] += shared_playout_outcomes[t_stride, 0]
shared_playout_outcomes[t, 1] += shared_playout_outcomes[t_stride, 1]
cuda.syncthreads()
stride >>= 1
if t == 0:
trees_playout_outcomes[ti, 0] = shared_playout_outcomes[0, 0]
trees_playout_outcomes[ti, 1] = shared_playout_outcomes[0, 1]
[docs]
@staticmethod
@cuda.jit(void(int32[:, :, :], int8[:, :], boolean[:, :], int8[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :], int16[:, :], xoroshiro128p_type[:], int32[:, :], int32[:, :, :]))
def _playout_acp_thrifty(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, trees_actions_expanded_flat, random_generators_playout, trees_playout_outcomes,
trees_playout_outcomes_children):
"""CUDA kernel responsible for computations of stage: playouts (variant ``"acp_thrifty"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
shared_playout_outcomes = cuda.shared.array((512, 2), dtype=int16) # 1024 - assumed max tpb for playouts, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout
local_board = cuda.local.array((32, 32), dtype=int8)
local_extra_info = cuda.local.array(4096, dtype=int8)
local_legal_actions_with_count = cuda.local.array(512 + 1, dtype=int16) # 512 - assumed limit on max actions
tai = cuda.blockIdx.x # tree-action pair index
ti = trees_actions_expanded_flat[tai, 0]
action = trees_actions_expanded_flat[tai, 1]
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
to_be_played_out = trees_nodes_selected[ti] # temporarily to_be_played_out equals selected
fake_child_for_playout = trees_actions_expanded[ti, -2]
last_action = int16(-1) # none yet
if fake_child_for_playout == int16(-2): # check if playouts are to be made on all children of selected
last_action = action
to_be_played_out = trees[ti, to_be_played_out, 1 + last_action]
if trees_terminals[ti, to_be_played_out]: # root for playouts has been discovered terminal before (by game rules) -> taking stored outcome ("multiplied" by tpb)
if t == 0:
outcome = trees_outcomes[ti, to_be_played_out]
if fake_child_for_playout == int16(-2): # case where terminal is one child among all children of selected node
trees_playout_outcomes_children[ti, action, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1
trees_playout_outcomes_children[ti, action, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1
else: # case where terminal was selected
trees_playout_outcomes[ti, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1
trees_playout_outcomes[ti, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1
else:
t = cuda.threadIdx.x
state_max_actions = trees.shape[2] - 1
t_global = ti * state_max_actions * tpb + action * tpb + t # purposely (instead of t_global = cuda.grid(1)) to make resutls of acp_prodigal and acp_thrifty same (for equal number of steps)
shared_playout_outcomes[t, 0] = np.int16(0)
shared_playout_outcomes[t, 1] = np.int16(0)
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, to_be_played_out, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, to_be_played_out, e]
e += tpb
cuda.syncthreads()
for i in range(m):
for j in range(n):
local_board[i, j] = shared_board[i, j]
for i in range(extra_info_memory):
local_extra_info[i] = shared_extra_info[i]
local_legal_actions_with_count[-1] = 0
turn = trees_turns[ti, to_be_played_out]
outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) if last_action != int16(-1) else int8(2) # else case only when trees not grown due to memory limit (then selected played out)
while True: # playout loop
if not (outcome == int8(-1) or outcome == int8(0) or outcome == int8(1)): # indecisive, game ongoing
legal_actions_playout(m, n, local_board, local_extra_info, turn, local_legal_actions_with_count)
count = local_legal_actions_with_count[-1]
action_ord = int16(xoroshiro128p_uniform_float32(random_generators_playout, t_global) * count)
last_action = local_legal_actions_with_count[action_ord]
take_action_playout(m, n, local_board, local_extra_info, turn, last_action, action_ord, local_legal_actions_with_count)
turn = -turn
else:
if outcome != int8(0):
shared_playout_outcomes[t, (outcome + 1) // 2] = int8(1)
break
outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # sum reduction pattern
if t < stride:
t_stride = t + stride
shared_playout_outcomes[t, 0] += shared_playout_outcomes[t_stride, 0]
shared_playout_outcomes[t, 1] += shared_playout_outcomes[t_stride, 1]
cuda.syncthreads()
stride >>= 1
if t == 0:
trees_playout_outcomes_children[ti, action, 0] = shared_playout_outcomes[0, 0]
trees_playout_outcomes_children[ti, action, 1] = shared_playout_outcomes[0, 1]
[docs]
@staticmethod
@cuda.jit(void(int32[:, :, :], int8[:, :], boolean[:, :], int8[:, :], int8[:, :, :, :], int8[:, :, :], int32[:], int16[:, :], xoroshiro128p_type[:], int32[:, :], int32[:, :, :]))
def _playout_acp_prodigal(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, random_generators_playout, trees_playout_outcomes,
trees_playout_outcomes_children):
"""CUDA kernel responsible for computations of stage: playouts (variant ``"acp_prodigal"``)."""
shared_board = cuda.shared.array((32, 32), dtype=int8) # assumed max board size (for selected node in tree associated with block)
shared_extra_info = cuda.shared.array(4096, dtype=int8) # 4096 - assumed limit on max extra info
shared_playout_outcomes = cuda.shared.array((512, 2), dtype=int16) # 1024 - assumed max tpb for playouts, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout
ti = cuda.blockIdx.x
action = cuda.blockIdx.y
if trees_actions_expanded[ti, action] < int16(0): # prodigality
return
local_board = cuda.local.array((32, 32), dtype=int8)
local_extra_info = cuda.local.array(4096, dtype=int8)
local_legal_actions_with_count = cuda.local.array(512 + 1, dtype=int16) # 512 - assumed limit on max actions
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
to_be_played_out = trees_nodes_selected[ti] # temporarily to_be_played_out equals selected
fake_child_for_playout = trees_actions_expanded[ti, -2]
last_action = int16(-1) # none yet
if fake_child_for_playout == int16(-2): # check if playouts are to be made on all children of selected
last_action = action
to_be_played_out = trees[ti, to_be_played_out, 1 + last_action] # moving one level down from selected
if trees_terminals[ti, to_be_played_out]: # root for playouts has been discovered terminal before (by game rules) -> taking stored outcome ("multiplied" by tpb)
if t == 0:
outcome = trees_outcomes[ti, to_be_played_out]
if fake_child_for_playout == int16(-2): # case where terminal is one child among all children of selected node
trees_playout_outcomes_children[ti, action, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1
trees_playout_outcomes_children[ti, action, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1
else: # case where terminal was selected
trees_playout_outcomes[ti, 0] = int32(tpb) if outcome == int8(-1) else int32(0) # wins of -1
trees_playout_outcomes[ti, 1] = int32(tpb) if outcome == int8(1) else int32(0) # wins of +1
else: # playouts for non-terminal
t = cuda.threadIdx.x
state_max_actions = trees.shape[2] - 1
t_global = ti * state_max_actions * tpb + action * tpb + t
shared_playout_outcomes[t, 0] = np.int16(0)
shared_playout_outcomes[t, 1] = np.int16(0)
_, _, m, n = trees_boards.shape
m_n = m * n
bept = (m_n + tpb - 1) // tpb # board elements per thread
e = t # board element flat index
for _ in range(bept):
if e < m_n:
i = e // n
j = e % n
shared_board[i, j] = trees_boards[ti, to_be_played_out, i, j]
e += tpb
_, _, extra_info_memory = trees_extra_infos.shape
eipt = (extra_info_memory + tpb - 1) // tpb
e = t
for _ in range(eipt):
if e < extra_info_memory:
shared_extra_info[e] = trees_extra_infos[ti, to_be_played_out, e]
e += tpb
cuda.syncthreads()
for i in range(m):
for j in range(n):
local_board[i, j] = shared_board[i, j]
for i in range(extra_info_memory):
local_extra_info[i] = shared_extra_info[i]
local_legal_actions_with_count[-1] = 0
turn = trees_turns[ti, to_be_played_out]
outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action) if last_action != int16(-1) else int8(2) # else case only when trees not grown due to memory limit (then selected played out)
while True: # playout loop
if not (outcome == int8(-1) or outcome == int8(0) or outcome == int8(1)): # indecisive, game ongoing
legal_actions_playout(m, n, local_board, local_extra_info, turn, local_legal_actions_with_count)
count = local_legal_actions_with_count[-1]
action_ord = int16(xoroshiro128p_uniform_float32(random_generators_playout, t_global) * count)
last_action = local_legal_actions_with_count[action_ord]
take_action_playout(m, n, local_board, local_extra_info, turn, last_action, action_ord, local_legal_actions_with_count)
turn = -turn
else:
if outcome != int8(0):
shared_playout_outcomes[t, (outcome + 1) // 2] = int8(1)
break
outcome = compute_outcome(m, n, local_board, local_extra_info, turn, last_action)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # sum reduction pattern
if t < stride:
t_stride = t + stride
shared_playout_outcomes[t, 0] += shared_playout_outcomes[t_stride, 0]
shared_playout_outcomes[t, 1] += shared_playout_outcomes[t_stride, 1]
cuda.syncthreads()
stride >>= 1
if t == 0:
trees_playout_outcomes_children[ti, action, 0] = shared_playout_outcomes[0, 0]
trees_playout_outcomes_children[ti, action, 1] = shared_playout_outcomes[0, 1]
[docs]
@staticmethod
@cuda.jit(void(int16, int32[:, :, :], int8[:, :], int32[:, :], int32[:, :], int32[:], int32[:, :], int16[:, :], int32[:, :]))
def _backup_ocp(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_selected_paths, trees_actions_expanded, trees_playout_outcomes):
"""CUDA kernel responsible for computations of stage: backups (variant ``"ocp_thrifty"`` or ``"ocp_prodigal"``)."""
ti = cuda.blockIdx.x
t = cuda.threadIdx.x
tpb = cuda.blockDim.x
n_negative_wins = trees_playout_outcomes[ti, 0]
n_positive_wins = trees_playout_outcomes[ti, 1]
path_length = trees_selected_paths[ti, -1]
pept = (path_length + tpb - 1) // tpb # path elements per thread
e = t
for _ in range(pept):
if e < path_length:
node = trees_selected_paths[ti, e]
trees_ns[ti, node] += n_playouts
if trees_turns[ti, node] == int8(1):
trees_ns_wins[ti, node] += n_negative_wins
else:
trees_ns_wins[ti, node] += n_positive_wins
e += tpb
if t == 0:
node = trees_nodes_selected[ti]
rand_child_for_playout = trees_actions_expanded[ti, -2]
if rand_child_for_playout != int16(-1): # check if some child picked on random for playouts
last_action = trees_actions_expanded[ti, rand_child_for_playout]
node = trees[ti, node, 1 + last_action]
trees_ns[ti, node] += n_playouts
if trees_turns[ti, node] == int8(1):
trees_ns_wins[ti, node] += n_negative_wins
else:
trees_ns_wins[ti, node] += n_positive_wins
[docs]
@staticmethod
@cuda.jit(void(int16, int32[:, :, :], int8[:, :], int32[:, :], int32[:, :], int32[:], int16[:, :], int32[:, :], int32[:, :, :]))
def _backup_1_acp_thrifty(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_actions_expanded, trees_playout_outcomes, trees_playout_outcomes_children):
"""CUDA kernel responsible for computations of stage: backups (substage 1, variant ``"acp_thrifty"``)."""
shared_playout_outcomes_children = cuda.shared.array((512, 2), dtype=int32) # 512 - assumed limit on max actions, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
fake_child_for_playout = trees_actions_expanded[ti, -2]
if fake_child_for_playout == int16(-2): # check if actual children of selected were played out
selected = trees_nodes_selected[ti]
n_expanded_actions = trees_actions_expanded[ti, -1]
if t < n_expanded_actions:
a = trees_actions_expanded[ti, t]
n_negative_wins = trees_playout_outcomes_children[ti, a, 0]
n_positive_wins = trees_playout_outcomes_children[ti, a, 1]
child_node = trees[ti, selected, 1 + a]
trees_ns[ti, child_node] += n_playouts
if trees_turns[ti, child_node] == int8(1):
trees_ns_wins[ti, child_node] += n_negative_wins
else:
trees_ns_wins[ti, child_node] += n_positive_wins
shared_playout_outcomes_children[t, 0] = n_negative_wins
shared_playout_outcomes_children[t, 1] = n_positive_wins
else:
shared_playout_outcomes_children[t, 0] = np.int32(0)
shared_playout_outcomes_children[t, 1] = np.int32(0)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # sum reduction pattern
if t < stride:
t_stride = t + stride
shared_playout_outcomes_children[t, 0] += shared_playout_outcomes_children[t_stride, 0]
shared_playout_outcomes_children[t, 1] += shared_playout_outcomes_children[t_stride, 1]
cuda.syncthreads()
stride >>= 1
if t == 0:
trees_playout_outcomes[ti, 0] = shared_playout_outcomes_children[0, 0]
trees_playout_outcomes[ti, 1] = shared_playout_outcomes_children[0, 1]
[docs]
@staticmethod
@cuda.jit(void(int16, int32[:, :, :], int8[:, :], int32[:, :], int32[:, :], int32[:], int16[:, :], int32[:, :], int32[:, :, :]))
def _backup_1_acp_prodigal(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_actions_expanded, trees_playout_outcomes, trees_playout_outcomes_children):
"""CUDA kernel responsible for computations of stage: backups (substage 1, variant ``"acp_prodigal"``)."""
shared_playout_outcomes_children = cuda.shared.array((512, 2), dtype=int32) # 512 - assumed limit on max actions, two cells for a row (-1 win, +1 win), each flagged by 0 or 1 after playout
ti = cuda.blockIdx.x # tree index
tpb = cuda.blockDim.x
t = cuda.threadIdx.x
fake_child_for_playout = trees_actions_expanded[ti, -2]
max_actions = trees_actions_expanded.shape[1] - 2
if fake_child_for_playout == int16(-2): # check if actual children of selected were played out
selected = trees_nodes_selected[ti]
if t < max_actions and trees_actions_expanded[ti, t] != int16(-1): # prodigality
a = t
n_negative_wins = trees_playout_outcomes_children[ti, a, 0]
n_positive_wins = trees_playout_outcomes_children[ti, a, 1]
child_node = trees[ti, selected, 1 + a]
trees_ns[ti, child_node] += n_playouts
if trees_turns[ti, child_node] == int8(1):
trees_ns_wins[ti, child_node] += n_negative_wins
else:
trees_ns_wins[ti, child_node] += n_positive_wins
shared_playout_outcomes_children[t, 0] = n_negative_wins
shared_playout_outcomes_children[t, 1] = n_positive_wins
else:
shared_playout_outcomes_children[t, 0] = np.int32(0)
shared_playout_outcomes_children[t, 1] = np.int32(0)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # sum reduction pattern
if t < stride:
t_stride = t + stride
shared_playout_outcomes_children[t, 0] += shared_playout_outcomes_children[t_stride, 0]
shared_playout_outcomes_children[t, 1] += shared_playout_outcomes_children[t_stride, 1]
cuda.syncthreads()
stride >>= 1
if t == 0:
trees_playout_outcomes[ti, 0] = shared_playout_outcomes_children[0, 0]
trees_playout_outcomes[ti, 1] = shared_playout_outcomes_children[0, 1]
[docs]
@staticmethod
@cuda.jit(void(int16, int8[:, :], int32[:, :], int32[:, :], int32[:, :], int16[:, :], int32[:, :]))
def _backup_2_acp(n_playouts, trees_turns, trees_ns, trees_ns_wins, trees_selected_paths, trees_actions_expanded, trees_playout_outcomes):
"""CUDA kernel responsible for computations of stage: backups (substage 2, variant ``"acp_thrifty"`` or ``"acp_prodigal"``)."""
ti = cuda.blockIdx.x
t = cuda.threadIdx.x
tpb = cuda.blockDim.x
n_negative_wins = trees_playout_outcomes[ti, 0]
n_positive_wins = trees_playout_outcomes[ti, 1]
n_expanded_actions = trees_actions_expanded[ti, -1]
if n_expanded_actions == int16(0): # terminal was being "played out"
n_expanded_actions = int16(1)
n_playouts_total = n_playouts * n_expanded_actions
path_length = trees_selected_paths[ti, -1]
pept = (path_length + tpb - 1) // tpb # path elements per thread
e = t
for _ in range(pept):
if e < path_length:
node = trees_selected_paths[ti, e]
trees_ns[ti, node] += n_playouts_total
if trees_turns[ti, node] == int8(1):
trees_ns_wins[ti, node] += n_negative_wins
else:
trees_ns_wins[ti, node] += n_positive_wins
e += tpb
[docs]
@staticmethod
@cuda.jit(void(int32[:, :, :], boolean[:, :], int8[:, :], int32[:, :], int32[:, :], int16[:], int8, int64[:], boolean[:], int64[:], int64[:]))
def _reduce_over_trees_thrifty(trees, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, root_actions_expanded, root_turn, root_ns, actions_win_flags, actions_ns, actions_ns_wins):
"""CUDA kernel responsible for sum-reduction over trees (thrifty number of blocks, variant ``ocp_thrifty`` or ``acp_thrifty``)."""
shared_root_ns = cuda.shared.array(512, dtype=int64) # 512 - assumed max of n_trees
shared_actions_ns = cuda.shared.array(512, dtype=int64)
shared_actions_ns_wins = cuda.shared.array(512, dtype=int64)
b = cuda.blockIdx.x
action = root_actions_expanded[b] # action index
n_trees = trees.shape[0]
tpb = cuda.blockDim.x
t = cuda.threadIdx.x # thread index == tree index
if t < n_trees:
shared_root_ns[t] = int64(trees_ns[t, 0])
action_node = trees[t, 0, 1 + action]
shared_actions_ns[t] = int64(trees_ns[t, action_node])
shared_actions_ns_wins[t] = int64(trees_ns_wins[t, action_node])
else:
shared_root_ns[t] = int64(0)
shared_actions_ns[t] = int64(0)
shared_actions_ns_wins[t] = int64(0)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # sum reduction pattern
if t < stride:
t_stride = t + stride
shared_root_ns[t] += shared_root_ns[t_stride]
shared_actions_ns[t] += shared_actions_ns[t_stride]
shared_actions_ns_wins[t] += shared_actions_ns_wins[t_stride]
cuda.syncthreads()
stride >>= 1
if t == 0:
root_ns[b] = shared_root_ns[0]
actions_ns[b] = shared_actions_ns[0]
actions_ns_wins[b] = shared_actions_ns_wins[0]
action_node = trees[0, 0, 1 + action]
actions_win_flags[b] = action_node != int32(-1) and trees_terminals[0, action_node] and trees_outcomes[0, action_node] == root_turn
[docs]
@staticmethod
@cuda.jit(void(int32[:, :, :], boolean[:, :], int8[:, :], int32[:, :], int32[:, :], int16[:], int8, int64[:], boolean[:], int64[:], int64[:]))
def _reduce_over_trees_prodigal(trees, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, root_actions_expanded, root_turn, root_ns, actions_win_flags, actions_ns, actions_ns_wins):
"""CUDA kernel responsible for sum-reduction over trees (prodigal number of blocks, variant ``ocp_prodigal`` or ``acp_prodigal``)."""
shared_root_ns = cuda.shared.array(512, dtype=int64) # 512 - assumed max of n_trees
shared_actions_ns = cuda.shared.array(512, dtype=int64)
shared_actions_ns_wins = cuda.shared.array(512, dtype=int64)
b = cuda.blockIdx.x
action = b # action index
t = cuda.threadIdx.x # thread index == tree index
shared_root_ns[t] = int64(0)
shared_actions_ns[t] = int64(0)
shared_actions_ns_wins[t] = int64(0)
if root_actions_expanded[action] != int16(-1):
n_trees = trees.shape[0]
tpb = cuda.blockDim.x
if t < n_trees:
shared_root_ns[t] = trees_ns[t, 0]
action_node = trees[t, 0, 1 + action]
shared_actions_ns[t] = trees_ns[t, action_node]
shared_actions_ns_wins[t] = trees_ns_wins[t, action_node]
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # sum reduction pattern
if t < stride:
t_stride = t + stride
shared_root_ns[t] += shared_root_ns[t_stride]
shared_actions_ns[t] += shared_actions_ns[t_stride]
shared_actions_ns_wins[t] += shared_actions_ns_wins[t_stride]
cuda.syncthreads()
stride >>= 1
if t == 0:
root_ns[b] = shared_root_ns[0]
actions_ns[b] = shared_actions_ns[0]
actions_ns_wins[b] = shared_actions_ns_wins[0]
action_node = trees[0, 0, 1 + b]
actions_win_flags[b] = action_node != int32(-1) and trees_terminals[0, action_node] and trees_outcomes[0, action_node] == root_turn
[docs]
@staticmethod
@cuda.jit(void(int16, boolean[:], int64[:], int64[:], int16[:], boolean[:], int64[:], int64[:]))
def _reduce_over_actions_thrifty(n_root_actions, actions_win_flags, actions_ns, actions_ns_wins, best_action, best_win_flag, best_n, best_n_wins):
"""CUDA kernel responsible for max/argmax-reduction over actions (thrifty number of blocks, variant ``ocp_thrifty`` or ``acp_thrifty``)."""
shared_actions = cuda.shared.array(512, dtype=int16) # 512 - assumed max state actions
shared_actions_win_flags = cuda.shared.array(512, dtype=boolean)
shared_actions_ns = cuda.shared.array(512, dtype=int64)
shared_actions_ns_wins = cuda.shared.array(512, dtype=int64)
tpb = cuda.blockDim.x
a = cuda.threadIdx.x # action index
shared_actions[a] = a
if a < n_root_actions:
shared_actions_win_flags[a] = actions_win_flags[a]
shared_actions_ns[a] = actions_ns[a]
shared_actions_ns_wins[a] = actions_ns_wins[a]
else:
shared_actions_win_flags[a] = False
shared_actions_ns[a] = int64(0)
shared_actions_ns_wins[a] = int64(0)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # max-argmax reduction pattern
if a < stride:
a_stride = a + stride
if (shared_actions_win_flags[a] < shared_actions_win_flags[a_stride]) or\
((shared_actions_win_flags[a] == shared_actions_win_flags[a_stride]) and (shared_actions_ns[a] < shared_actions_ns[a_stride])) or\
((shared_actions_win_flags[a] == shared_actions_win_flags[a_stride]) and (shared_actions_ns[a] == shared_actions_ns[a_stride]) and (shared_actions_ns_wins[a] < shared_actions_ns_wins[a_stride])):
shared_actions[a] = shared_actions[a_stride]
shared_actions_ns[a] = shared_actions_ns[a_stride]
shared_actions_ns_wins[a] = shared_actions_ns_wins[a_stride]
shared_actions_win_flags[a] = shared_actions_win_flags[a_stride]
cuda.syncthreads()
stride >>= 1
if a == 0:
best_action[0] = shared_actions[0]
best_win_flag[0] = shared_actions_win_flags[0]
best_n[0] = shared_actions_ns[0]
best_n_wins[0] = shared_actions_ns_wins[0]
[docs]
@staticmethod
@cuda.jit(void(boolean[:], int64[:], int64[:], int16[:], boolean[:], int64[:], int64[:]))
def _reduce_over_actions_prodigal(actions_win_flags, actions_ns, actions_ns_wins, best_action, best_win_flag, best_n, best_n_wins):
"""CUDA kernel responsible for max/argmax-reduction over actions (prodigal number of blocks, variant ``ocp_prodigal`` or ``acp_prodigal``)."""
shared_actions = cuda.shared.array(512, dtype=int16) # 512 - assumed max state actions
shared_actions_win_flags = cuda.shared.array(512, dtype=boolean)
shared_actions_ns = cuda.shared.array(512, dtype=int64)
shared_actions_ns_wins = cuda.shared.array(512, dtype=int64)
tpb = cuda.blockDim.x
a = cuda.threadIdx.x # action index
shared_actions[a] = a
state_max_actions = actions_ns.size
if a < state_max_actions:
shared_actions_win_flags[a] = actions_win_flags[a]
shared_actions_ns[a] = actions_ns[a]
shared_actions_ns_wins[a] = actions_ns_wins[a]
else:
shared_actions_win_flags[a] = False
shared_actions_ns[a] = int64(0)
shared_actions_ns_wins[a] = int64(0)
cuda.syncthreads()
stride = tpb >> 1 # half of tpb
while stride > 0: # max-argmax reduction pattern
if a < stride:
a_stride = a + stride
if (shared_actions_win_flags[a] < shared_actions_win_flags[a_stride]) or\
((shared_actions_win_flags[a] == shared_actions_win_flags[a_stride]) and (shared_actions_ns[a] < shared_actions_ns[a_stride])) or\
((shared_actions_win_flags[a] == shared_actions_win_flags[a_stride]) and (shared_actions_ns[a] == shared_actions_ns[a_stride]) and (shared_actions_ns_wins[a] < shared_actions_ns_wins[a_stride])):
shared_actions[a] = shared_actions[a_stride]
shared_actions_ns[a] = shared_actions_ns[a_stride]
shared_actions_ns_wins[a] = shared_actions_ns_wins[a_stride]
shared_actions_win_flags[a] = shared_actions_win_flags[a_stride]
cuda.syncthreads()
stride >>= 1
if a == 0:
best_action[0] = shared_actions[0]
best_win_flag[0] = shared_actions_win_flags[0]
best_n[0] = shared_actions_ns[0]
best_n_wins[0] = shared_actions_ns_wins[0]
[docs]
def _json_dump(self, fname):
"""Dumps (saves) device-side arrays, copied to host, representing trees and MCTS elements from the last run to a text file in json format."""
if self.verbose_info:
print(f"JSON DUMP... [to file: {fname}]")
t1 = time.time()
d = {}
d["n_trees"] = self.n_trees
d["n_playouts"] = self.n_playouts
d["variant"] = self.variant
d["search_time_limit"] = self.search_time_limit if self.search_time_limit < np.inf else "inf"
d["search_steps_limit"] = self.search_steps_limit if self.search_steps_limit < np.inf else "inf"
d["ucb_c"] = self.ucb_c
d["seed"] = self.seed
d["device_memory"] = self.device_memory
trees_sizes = np.empty_like(self.dev_trees_sizes)
self.dev_trees_sizes.copy_to_host(ary=trees_sizes)
tree_size_max = np.max(trees_sizes)
trees = np.empty_like(self.dev_trees)
self.dev_trees.copy_to_host(ary=trees)
trees = trees[:, :tree_size_max, :]
trees_depths = np.empty_like(self.dev_trees_depths)
self.dev_trees_depths.copy_to_host(ary=trees_depths)
trees_depths = trees_depths[:, :tree_size_max]
depth_max = -np.inf
for i in range(self.n_trees):
depth_max = max(depth_max, np.max(trees_depths[i, :trees_sizes[i]]))
trees_turns = np.empty_like(self.dev_trees_turns)
self.dev_trees_turns.copy_to_host(ary=trees_turns)
trees_turns = trees_turns[:, :tree_size_max]
trees_ns = np.empty_like(self.dev_trees_ns)
self.dev_trees_ns.copy_to_host(ary=trees_ns)
trees_ns = trees_ns[:, :tree_size_max]
trees_ns_wins = np.empty_like(self.dev_trees_ns_wins)
self.dev_trees_ns_wins.copy_to_host(ary=trees_ns_wins)
trees_ns_wins = trees_ns_wins[:, :tree_size_max]
trees_nodes_selected = np.empty_like(self.dev_trees_nodes_selected)
self.dev_trees_nodes_selected.copy_to_host(ary=trees_nodes_selected)
trees_selected_paths = np.empty_like(self.dev_trees_selected_paths)
self.dev_trees_selected_paths.copy_to_host(ary=trees_selected_paths)
tmp_trees_selected_paths = trees_selected_paths[:, :depth_max + 2];
tmp_trees_selected_paths[:, -1] = trees_selected_paths[:, -1]
trees_selected_paths = tmp_trees_selected_paths
trees_actions_expanded = np.empty_like(self.dev_trees_actions_expanded)
self.dev_trees_actions_expanded.copy_to_host(ary=trees_actions_expanded)
trees_playout_outcomes = np.empty_like(self.dev_trees_playout_outcomes)
self.dev_trees_playout_outcomes.copy_to_host(ary=trees_playout_outcomes)
trees_playout_outcomes_children = None
if self.dev_trees_playout_outcomes_children is not None:
trees_playout_outcomes_children = np.empty_like(self.dev_trees_playout_outcomes_children)
self.dev_trees_playout_outcomes_children.copy_to_host(ary=trees_playout_outcomes_children)
d["trees"] = trees.tolist()
d["trees"] = trees.tolist()
d["trees_sizes"] = trees_sizes.tolist()
d["trees_depths"] = trees_depths.tolist()
d["trees_turns"] = trees_turns.tolist()
d["trees_ns"] = trees_ns.tolist()
d["trees_ns_wins"] = trees_ns_wins.tolist()
d["trees_nodes_selected"] = trees_nodes_selected.tolist()
d["trees_selected_paths"] = trees_selected_paths.tolist()
d["trees_actions_expanded"] = trees_actions_expanded.tolist()
d["trees_playout_outcomes"] = trees_playout_outcomes.tolist()
if trees_playout_outcomes_children is not None:
d["trees_playout_outcomes_children"] = trees_playout_outcomes_children.tolist()
try:
f = open(fname, "w+")
json.dump(d, f, indent=2)
f.close()
except IOError:
sys.exit(f"[error occurred when trying to dump MCTSNC as json to file: {fname}]")
t2 = time.time()
if self.verbose_info:
print(f"JSON DUMP DONE. [time: {t2 - t1} s]")