mctsnc module

This module contains the core algorithmic functionalities of the project, embodied by the class MCTSNC.

With CUDA computational model in mind, we have proposed and implemented in class MCTSNC four, fast operating and thoroughly parallel, variants of Monte Carlo Tree Search algorithm. The provided implementation takes advantage of Numba, a just-in-time Python compiler, and its numba.cuda package (hence the “NC” suffix in the name). By thoroughly parallel we understand an algorithmic design that applies to both: (1) the structural elements of trees - leaf-/root-/tree-level parallelization (all those three are combined), and (2) the stages of MCTS - each stage in itself (selection, expansion, playouts, backup) employs multiple GPU threads. We apply suitable reduction patterns to carry out summations or max / argmax operations. Cooperation of threads helps to transfer information between global and shared memory. The implementation uses: no atomic operations, no mutexes (lock-free), and very few device-host memory transfers.

Example usage 1 (Connect 4)

Assume the mechanics of the Connect 4 game have been defined to MCTS-NC in mctsnc_game_mechanics.py (via device functions is_action_legal, take_action, etc.), and that c4 - instance of C4(State) - represents a state of an ongoing Connect 4 game shown below.

|.|.|●|○|.|.|.|
|.|.|●|○|.|.|○|
|.|.|●|●|.|●|●|
|.|●|○|●|.|○|●|
|.|○|●|○|.|●|○|
|○|○|○|●|●|○|○|
 0 1 2 3 4 5 6      

Then, running the following code

ai = MCTSNC(C4.get_board_shape(), C4.get_extra_info_memory(), C4.get_max_actions())
ai.init_device_side_arrays()
best_action = ai.run(c4.get_board(), c4.get_extra_info(), c4.get_turn())
print(f"BEST ACTION: {best_action}")

results in finding the best action for black - move 4 (winning in two plies), and the following printout:

[MCTSNC._init_device_side_arrays()... for MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[MCTSNC._init_device_side_arrays() done; time: 0.5193691253662109 s, per_state_memory: 95 B,  calculated max_tree_size: 2825549]
MCTSNC RUN... [MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[actions info:
{
  0: {'name': '0', 'n_root': 7474304, 'win_flag': False, 'n': 2182400, 'n_wins': 2100454, 'q': 0.9624514296187683, 'ucb': 0.9678373740384631},
  1: {'name': '1', 'n_root': 7474304, 'win_flag': False, 'n': 185344, 'n_wins': 164757, 'q': 0.8889254575276243, 'ucb': 0.9074070665330406},
  4: {'name': '4', 'n_root': 7474304, 'win_flag': False, 'n': 4921472, 'n_wins': 4885924, 'q': 0.9927769577882389, 'ucb': 0.9963635461474457},
  5: {'name': '5', 'n_root': 7474304, 'win_flag': False, 'n': 105472, 'n_wins': 91863, 'q': 0.8709704945388349, 'ucb': 0.8954701768685893},
  6: {'name': '6', 'n_root': 7474304, 'win_flag': False, 'n': 79616, 'n_wins': 68403, 'q': 0.8591614750803859, 'ucb': 0.8873601607647162},
  best: {'index': 4, 'name': '4', 'n_root': 7474304, 'win_flag': False, 'n': 4921472, 'n_wins': 4885924, 'q': 0.9927769577882389, 'ucb': 0.9963635461474457}
}]
[performance info:
{
  steps: 6373,
  steps_per_second: 1274.0076324260813,
  playouts: 7474304,
  playouts_per_second: 1494166.0666990099,
  times_[ms]: {'total': 5002.324819564819, 'loop': 5000.642776489258, 'reduce_over_trees': 0.29015541076660156, 'reduce_over_actions': 0.4520416259765625, 'mean_loop': 0.7846607212441955, 'mean_select': 0.11222893376562147, 'mean_expand': 0.2786097114284054, 'mean_playout': 0.17186361935680036, 'mean_backup': 0.2193056618645448},
  trees: {'count': 8, 'mean_depth': 5.176703163017032, 'max_depth': 12, 'mean_size': 1233.0, 'max_size': 2736}
}]
MCTSNC RUN DONE. [time: 5.002324819564819 s; best action: 4, best win_flag: False, best n: 4921472, best n_wins: 4885924, best q: 0.9927769577882389]
BEST ACTION: 4

Example usage 2 (Gomoku)

Assume the mechanics of the Gomoku game have been defined to MCTS-NC in mctsnc_game_mechanics.py (via device functions is_action_legal, take_action, etc.), and that g - instance of Gomoku(State) - represents a state of an ongoing Gomoku game shown below.

  ABCDEFGHIJKLMNO
15+++++++++++++++15
14+++++++++++++++14
13+++++++++++++++13
12++++++++●++++++12
11++++++++○++++++11
10++++++++○++++++10
 9++++++○+○++++++9
 8+++++++●○++++++8
 7+++++++●●●○++++7
 6++++++++●●○++++6
 5+++++++●+++++++5
 4+++++++++++++++4
 3+++++++++++++++3
 2+++++++++++++++2
 1+++++++++++++++1
  ABCDEFGHIJKLMNO

Then, running the following code

ai = MCTSNC(Gomoku.get_board_shape(), Gomoku.get_extra_info_memory(), Gomoku.get_max_actions(), action_index_to_name_function=Gomoku.action_index_to_name)
ai.init_device_side_arrays()
best_action = ai.run(g.get_board(), g.get_extra_info(), g.get_turn())
print(f"BEST ACTION: {best_action}")

results in finding the defensive action for white - move K8 (indexed as 115) that prevents black from winning in three plies, and the following printout:

[MCTSNC._init_device_side_arrays()... for MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[MCTSNC._init_device_side_arrays() done; time: 0.5558419227600098 s, per_state_memory: 1144 B,  calculated max_tree_size: 234637]
MCTSNC RUN... [MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[actions info:
{
  0: {'name': 'A1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 148906, 'q': 0.3478852048444976, 'ucb': 0.36098484108863044},
  1: {'name': 'B1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 149000, 'q': 0.34810481459330145, 'ucb': 0.3612044508374343},
  2: {'name': 'C1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 144339, 'q': 0.3372154418361244, 'ucb': 0.35031507808025725},
  ...
  115: {'name': 'K8', 'n_root': 94359552, 'win_flag': False, 'n': 1093632, 'n_wins': 452284, 'q': 0.41356141736891383, 'ucb': 0.4217566587685248},
  ...
  222: {'name': 'M15', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 148009, 'q': 0.34578956713516745, 'ucb': 0.3588892033793003},
  223: {'name': 'N15', 'n_root': 94359552, 'win_flag': False, 'n': 401408, 'n_wins': 148802, 'q': 0.37070013552295916, 'ucb': 0.38422722440183954},
  224: {'name': 'O15', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 145329, 'q': 0.3395283530203349, 'ucb': 0.35262798926446776},
  best: {'index': 115, 'name': 'K8', 'n_root': 94359552, 'win_flag': False, 'n': 1093632, 'n_wins': 452284, 'q': 0.41356141736891383, 'ucb': 0.4217566587685248}
}]
[performance info:
{
  steps: 442,
  steps_per_second: 88.25552729358726,
  playouts: 94359552,
  playouts_per_second: 18841067.91164404,
  times_[ms]: {'total': 5008.184909820557, 'loop': 5006.503105163574, 'reduce_over_trees': 0.20575523376464844, 'reduce_over_actions': 0.5161762237548828, 'mean_loop': 11.326930102180032, 'mean_select': 0.10066766005295974, 'mean_expand': 0.3082833139065704, 'mean_playout': 10.688265524298897, 'mean_backup': 0.226746317488036},
  trees: {'count': 8, 'mean_depth': 2.519115779878241, 'max_depth': 3, 'mean_size': 92149.0, 'max_size': 92149}
}]
MCTSNC RUN DONE. [time: 5.008184909820557 s; best action: 115 (K8), best win_flag: False, best n: 1093632, best n_wins: 452284, best q: 0.41356141736891383]
BEST ACTION: 115

Dependencies

numpy, math: required for mathematical computations.
numba: required for just-in-time compilation of CUDA kernels (decorated by @cuda.jit).
mctsnc_game_mechanics: required to define the mechanics of a wanted game or search problem via a set of five device-side functions - is_action_legal, take_action, legal_actions_playout, take_action_playout, compute_outcome callable by kernel functions of MCTSNC (see mctsnc_game_mechanics module).
For usage of MCTSNC class, NVIDIA CUDA drivers must be present in the operating system.

Link to project repository

https://github.com/pklesk/mcts_numba_cuda

Notes

Private functions of MCTSNC class are named with a single leading underscore (e.g.: _set_cuda_constants, _make_performance_info, _playout_acp_prodigal, etc.). Among them, the kernel functions are additionally described by @cuda.jit decorators coming from numba module. Exact specifications of types come along with the decorators. For public methods full docstrings are provided (with arguments and returns described). For private functions short docstrings are provided.

class mctsnc.MCTSNC(state_board_shape, state_extra_info_memory, state_max_actions, search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed=0, verbose_debug=False, verbose_info=True, action_index_to_name_function=None)[source]

Bases: object

Monte Carlo Tree Search implemented via numba.cuda meant for multi-threaded executions on GPU involving multiple concurrent trees and playouts (four algorithmic variants available).

VARIANTS = ['ocp_thrifty', 'ocp_prodigal', 'acp_thrifty', 'acp_prodigal']

DEFAULT_SEARCH_TIME_LIMIT = 5.0

DEFAULT_SEARCH_STEPS_LIMIT = inf

DEFAULT_N_TREES = 8

DEFAULT_N_PLAYOUTS = 128

DEFAULT_VARIANT = 'acp_prodigal'

DEFAULT_DEVICE_MEMORY = 2.0

DEFAULT_UCB_C = 2.0

DEFAULT_SEED = 0

DEFAULT_VERBOSE_DEBUG = False

DEFAULT_VERBOSE_INFO = True

MAX_STATE_BOARD_SHAPE = (32, 32)

MAX_STATE_EXTRA_INFO_MEMORY = 4096

MAX_STATE_MAX_ACTIONS = 512

MAX_TREE_SIZE = 16777216

MAX_N_TREES = 512

MAX_N_PLAYOUTS = 512

MAX_TREE_DEPTH = 2048

__init__(state_board_shape, state_extra_info_memory, state_max_actions, search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed=0, verbose_debug=False, verbose_info=True, action_index_to_name_function=None)[source]

Constructor of MCTSNC instances.

Args:

state_board_shape (tuple(int, int)):: shape of board for states in a given game, at most (32, 32).
state_extra_info_memory (int):: number of bytes for extra information on states, at most 4096.
state_max_actions (int):: maximum branching factor, at most 512.
search_time_limit (float):: time limit in seconds (computational budget), np.inf if no limit, defaults to 5.0.
search_steps_limit (float):: steps limit (computational budget), np.inf if no limit, defaults to np.inf.
n_trees (int):: number of independent trees, defaults to 8.
n_playouts (int):: number of independent playouts from an expanded child, must be a power of 2, defaults to 128.
variant (str):: choice of algorithmic variant from {"ocp_thrifty", "ocp_prodigal", "acp_thrifty, "acp_prodigal}, defaults to "acp_prodigal".
device_memory (float):: GPU memory in GiBs (gibibytes) to be available for this instance, defaults to 2.0.
ucb_c (float):: value of C constant, influencing exploration tendency, appearing in UCT formula (upper confidence bounds for trees), defaults to 2.0.
verbose_debug (bool):: debug verbosity flag, if True then detailed information about each kernel invocation are printed to console (in each iteration), defaults to False.
verbose_info (bool):: verbosity flag, if True then standard information on actions and performance are printed to console (after a full run), defaults to True.
action_index_to_name_function (callable):: pointer to user-provided function converting action indexes to a human-friendly names (e.g. "e2:e4" for chess), defaults to None.

_set_cuda_constants()[source]: Investigates (via numba module) if CUDA-based computations are available and, if so, sets suitable constants.

_validate_param(name, ptype, leq, low, geq, high, default)[source]: Validates a parameter - is it of correct type and within given range (either end of the range can be open or closed).

__str__()[source]

Returns a string representation of this MCTSNC instance.

Returns:: str: string representation of this MCTSNC instance.

__repr__()[source]

Returns a detailed string representation of this MCTSNC instance.

Returns:: str: detailed string representation of this MCTSNC instance.

init_device_side_arrays()[source]: Allocates all the necessary device arrays based on relevant constants and available memory.

run(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]

Runs the Monte Carlo Tree Search on GPU involving multiple concurrent trees and playouts. Computations are carried out according to the formerly chosen algorithmic variant, i.e. one of {"ocp_thrifty", "ocp_prodigal", "acp_thrifty, "acp_prodigal}, defaults to "acp_prodigal"}.

Args:

root_board (ndarray):: two-dimensional array with board (or other representation) of root state from which the search starts.
root_extra_info (ndarray):: any additional information of root state not implied by the contents of the board itself (e.g. possibilities of castling or en-passant captures in chess, the contract in double dummy bridge, etc.), or technical information useful to generate legal actions faster.
root_turn {-1, 1}:: indicator of the player, minimizing or maximizing, to act first at root state.
forced_search_steps_limit (int):: steps limit used only when reproducing results of a previous experiment; if less than``np.inf`` then has a priority over the standard computational budget given by search_time_limit and search_steps_limit.

Returns:

self.best_action (int):: best action resulting from search.

_flatten_trees_actions_expanded_thrifty(trees_actions_expanded)[source]: Uses information from array trees_actions_expanded of shape (self.n_trees, self.state_max_actions + 2) and converts it to another array where the number of rows corresponds to the total of expanded legal actions in all trees. Each row contains a pair of indexes for: action and tree. The approach allows to allocate exact number of needed CUDA blocks for further operations.

_make_performance_info()[source]: Prepares and returns a dictionary with information on performance during the last run. After the call, available via performance_info attribute.

_make_actions_info_thrifty()[source]: Prepares and returns a dictionary with information on root actions (using thrifty indexing) implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc. After the call, available via actions_info attribute.

_make_actions_info_prodigal()[source]: Prepares and returns a dictionary with information on root actions (using prodigal indexing) implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc. After the call, available via actions_info attribute.

_run_ocp_thrifty(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]: Runs computations for algorithmic variant: "ocp_thrifty".

_run_ocp_prodigal(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]: Runs computations for algorithmic variant: "ocp_prodigal".

_run_acp_thrifty(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]: Runs computations for algorithmic variant: "acp_thrifty".

_run_acp_prodigal(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]: Runs computations for algorithmic variant: "acp_prodigal".

static _reset(root_board, root_extra_info, root_turn, trees, trees_sizes, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos)[source]: CUDA kernel responsible for reseting root nodes of trees to new root state.

static _select(ucb_c, trees, trees_leaves, trees_ns, trees_ns_wins, trees_nodes_selected, trees_selected_paths)[source]: CUDA kernel responsible for computations of stage: selections.

static _expand_1_ocp_thrifty(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, random_generators_expand_1, trees_actions_expanded)[source]: CUDA kernel responsible for computations of stage: expansions (substage 1, variant "ocp_thrifty").

static _expand_1_ocp_prodigal(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, random_generators_expand_1, trees_actions_expanded)[source]: CUDA kernel responsible for computations of stage: expansions (substage 1, variant "ocp_prodigal").

static _expand_1_acp_thrifty(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded)[source]: CUDA kernel responsible for computations of stage: expansions (substage 1, variant "acp_thrifty").

static _expand_1_acp_prodigal(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded)[source]: CUDA kernel responsible for computations of stage: expansions (substage 1, variant "acp_prodigal").

static _memorize_root_actions_expanded(dev_trees_actions_expanded, dev_root_actions_expanded)[source]: CUDA kernel responsible for memorizing actions expanded at root node(s).

static _expand_2_thrifty(trees, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded_flat)[source]: CUDA kernel responsible for computations of stage: expansions (substage 2, thrifty number of blocks - variant "ocp_thrifty" or "acp_thrifty").

static _expand_2_prodigal(trees, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded)[source]: CUDA kernel responsible for computations of stage: expansions (substage 2, prodigal number of blocks - variant "ocp_prodigal" or "acp_prodigal").

static _playout_ocp(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, random_generators_playout, trees_playout_outcomes)[source]: CUDA kernel responsible for computations of stage: playouts (variant "ocp_thrifty" or "ocp_prodigal").

static _playout_acp_thrifty(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, trees_actions_expanded_flat, random_generators_playout, trees_playout_outcomes, trees_playout_outcomes_children)[source]: CUDA kernel responsible for computations of stage: playouts (variant "acp_thrifty").

static _playout_acp_prodigal(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, random_generators_playout, trees_playout_outcomes, trees_playout_outcomes_children)[source]: CUDA kernel responsible for computations of stage: playouts (variant "acp_prodigal").

static _backup_ocp(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_selected_paths, trees_actions_expanded, trees_playout_outcomes)[source]: CUDA kernel responsible for computations of stage: backups (variant "ocp_thrifty" or "ocp_prodigal").

static _backup_1_acp_thrifty(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_actions_expanded, trees_playout_outcomes, trees_playout_outcomes_children)[source]: CUDA kernel responsible for computations of stage: backups (substage 1, variant "acp_thrifty").

static _backup_1_acp_prodigal(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_actions_expanded, trees_playout_outcomes, trees_playout_outcomes_children)[source]: CUDA kernel responsible for computations of stage: backups (substage 1, variant "acp_prodigal").

static _backup_2_acp(n_playouts, trees_turns, trees_ns, trees_ns_wins, trees_selected_paths, trees_actions_expanded, trees_playout_outcomes)[source]: CUDA kernel responsible for computations of stage: backups (substage 2, variant "acp_thrifty" or "acp_prodigal").

static _reduce_over_trees_thrifty(trees, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, root_actions_expanded, root_turn, root_ns, actions_win_flags, actions_ns, actions_ns_wins)[source]: CUDA kernel responsible for sum-reduction over trees (thrifty number of blocks, variant ocp_thrifty or acp_thrifty).

static _reduce_over_trees_prodigal(trees, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, root_actions_expanded, root_turn, root_ns, actions_win_flags, actions_ns, actions_ns_wins)[source]: CUDA kernel responsible for sum-reduction over trees (prodigal number of blocks, variant ocp_prodigal or acp_prodigal).

static _reduce_over_actions_thrifty(n_root_actions, actions_win_flags, actions_ns, actions_ns_wins, best_action, best_win_flag, best_n, best_n_wins)[source]: CUDA kernel responsible for max/argmax-reduction over actions (thrifty number of blocks, variant ocp_thrifty or acp_thrifty).

static _reduce_over_actions_prodigal(actions_win_flags, actions_ns, actions_ns_wins, best_action, best_win_flag, best_n, best_n_wins)[source]: CUDA kernel responsible for max/argmax-reduction over actions (prodigal number of blocks, variant ocp_prodigal or acp_prodigal).

_json_dump(fname)[source]: Dumps (saves) device-side arrays, copied to host, representing trees and MCTS elements from the last run to a text file in json format.

__module__ = 'mctsnc'