mctsnc module
This module contains the core algorithmic functionalities of the project, embodied by the class MCTSNC
.
With CUDA computational model in mind, we have proposed and implemented in class MCTSNC
four, fast operating and thoroughly parallel, variants of Monte Carlo Tree Search algorithm.
The provided implementation takes advantage of Numba, a just-in-time Python compiler, and its numba.cuda
package (hence the “NC” suffix in the name).
By thoroughly parallel we understand an algorithmic design that applies to both: (1) the structural elements of trees - leaf-/root-/tree-level parallelization
(all those three are combined), and (2) the stages of MCTS - each stage in itself (selection, expansion, playouts, backup) employs multiple GPU threads.
We apply suitable reduction patterns to carry out summations or max / argmax operations. Cooperation of threads helps to transfer information between global and shared memory.
The implementation uses: no atomic operations, no mutexes (lock-free), and very few device-host memory transfers.
Example usage 1 (Connect 4)
Assume the mechanics of the Connect 4 game have been defined to MCTS-NC in mctsnc_game_mechanics.py
(via device functions is_action_legal
, take_action
, etc.),
and that c4
- instance of C4(State)
- represents a state of an ongoing Connect 4 game shown below.
|.|.|●|○|.|.|.|
|.|.|●|○|.|.|○|
|.|.|●|●|.|●|●|
|.|●|○|●|.|○|●|
|.|○|●|○|.|●|○|
|○|○|○|●|●|○|○|
0 1 2 3 4 5 6
Then, running the following code
ai = MCTSNC(C4.get_board_shape(), C4.get_extra_info_memory(), C4.get_max_actions())
ai.init_device_side_arrays()
best_action = ai.run(c4.get_board(), c4.get_extra_info(), c4.get_turn())
print(f"BEST ACTION: {best_action}")
results in finding the best action for black - move 4 (winning in two plies), and the following printout:
[MCTSNC._init_device_side_arrays()... for MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[MCTSNC._init_device_side_arrays() done; time: 0.5193691253662109 s, per_state_memory: 95 B, calculated max_tree_size: 2825549]
MCTSNC RUN... [MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[actions info:
{
0: {'name': '0', 'n_root': 7474304, 'win_flag': False, 'n': 2182400, 'n_wins': 2100454, 'q': 0.9624514296187683, 'ucb': 0.9678373740384631},
1: {'name': '1', 'n_root': 7474304, 'win_flag': False, 'n': 185344, 'n_wins': 164757, 'q': 0.8889254575276243, 'ucb': 0.9074070665330406},
4: {'name': '4', 'n_root': 7474304, 'win_flag': False, 'n': 4921472, 'n_wins': 4885924, 'q': 0.9927769577882389, 'ucb': 0.9963635461474457},
5: {'name': '5', 'n_root': 7474304, 'win_flag': False, 'n': 105472, 'n_wins': 91863, 'q': 0.8709704945388349, 'ucb': 0.8954701768685893},
6: {'name': '6', 'n_root': 7474304, 'win_flag': False, 'n': 79616, 'n_wins': 68403, 'q': 0.8591614750803859, 'ucb': 0.8873601607647162},
best: {'index': 4, 'name': '4', 'n_root': 7474304, 'win_flag': False, 'n': 4921472, 'n_wins': 4885924, 'q': 0.9927769577882389, 'ucb': 0.9963635461474457}
}]
[performance info:
{
steps: 6373,
steps_per_second: 1274.0076324260813,
playouts: 7474304,
playouts_per_second: 1494166.0666990099,
times_[ms]: {'total': 5002.324819564819, 'loop': 5000.642776489258, 'reduce_over_trees': 0.29015541076660156, 'reduce_over_actions': 0.4520416259765625, 'mean_loop': 0.7846607212441955, 'mean_select': 0.11222893376562147, 'mean_expand': 0.2786097114284054, 'mean_playout': 0.17186361935680036, 'mean_backup': 0.2193056618645448},
trees: {'count': 8, 'mean_depth': 5.176703163017032, 'max_depth': 12, 'mean_size': 1233.0, 'max_size': 2736}
}]
MCTSNC RUN DONE. [time: 5.002324819564819 s; best action: 4, best win_flag: False, best n: 4921472, best n_wins: 4885924, best q: 0.9927769577882389]
BEST ACTION: 4
Example usage 2 (Gomoku)
Assume the mechanics of the Gomoku game have been defined to MCTS-NC in mctsnc_game_mechanics.py
(via device functions is_action_legal
, take_action
, etc.),
and that g
- instance of Gomoku(State)
- represents a state of an ongoing Gomoku game shown below.
ABCDEFGHIJKLMNO
15+++++++++++++++15
14+++++++++++++++14
13+++++++++++++++13
12++++++++●++++++12
11++++++++○++++++11
10++++++++○++++++10
9++++++○+○++++++9
8+++++++●○++++++8
7+++++++●●●○++++7
6++++++++●●○++++6
5+++++++●+++++++5
4+++++++++++++++4
3+++++++++++++++3
2+++++++++++++++2
1+++++++++++++++1
ABCDEFGHIJKLMNO
Then, running the following code
ai = MCTSNC(Gomoku.get_board_shape(), Gomoku.get_extra_info_memory(), Gomoku.get_max_actions(), action_index_to_name_function=Gomoku.action_index_to_name)
ai.init_device_side_arrays()
best_action = ai.run(g.get_board(), g.get_extra_info(), g.get_turn())
print(f"BEST ACTION: {best_action}")
results in finding the defensive action for white - move K8 (indexed as 115) that prevents black from winning in three plies, and the following printout:
[MCTSNC._init_device_side_arrays()... for MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[MCTSNC._init_device_side_arrays() done; time: 0.5558419227600098 s, per_state_memory: 1144 B, calculated max_tree_size: 234637]
MCTSNC RUN... [MCTSNC(search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed: 0)]
[actions info:
{
0: {'name': 'A1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 148906, 'q': 0.3478852048444976, 'ucb': 0.36098484108863044},
1: {'name': 'B1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 149000, 'q': 0.34810481459330145, 'ucb': 0.3612044508374343},
2: {'name': 'C1', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 144339, 'q': 0.3372154418361244, 'ucb': 0.35031507808025725},
...
115: {'name': 'K8', 'n_root': 94359552, 'win_flag': False, 'n': 1093632, 'n_wins': 452284, 'q': 0.41356141736891383, 'ucb': 0.4217566587685248},
...
222: {'name': 'M15', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 148009, 'q': 0.34578956713516745, 'ucb': 0.3588892033793003},
223: {'name': 'N15', 'n_root': 94359552, 'win_flag': False, 'n': 401408, 'n_wins': 148802, 'q': 0.37070013552295916, 'ucb': 0.38422722440183954},
224: {'name': 'O15', 'n_root': 94359552, 'win_flag': False, 'n': 428032, 'n_wins': 145329, 'q': 0.3395283530203349, 'ucb': 0.35262798926446776},
best: {'index': 115, 'name': 'K8', 'n_root': 94359552, 'win_flag': False, 'n': 1093632, 'n_wins': 452284, 'q': 0.41356141736891383, 'ucb': 0.4217566587685248}
}]
[performance info:
{
steps: 442,
steps_per_second: 88.25552729358726,
playouts: 94359552,
playouts_per_second: 18841067.91164404,
times_[ms]: {'total': 5008.184909820557, 'loop': 5006.503105163574, 'reduce_over_trees': 0.20575523376464844, 'reduce_over_actions': 0.5161762237548828, 'mean_loop': 11.326930102180032, 'mean_select': 0.10066766005295974, 'mean_expand': 0.3082833139065704, 'mean_playout': 10.688265524298897, 'mean_backup': 0.226746317488036},
trees: {'count': 8, 'mean_depth': 2.519115779878241, 'max_depth': 3, 'mean_size': 92149.0, 'max_size': 92149}
}]
MCTSNC RUN DONE. [time: 5.008184909820557 s; best action: 115 (K8), best win_flag: False, best n: 1093632, best n_wins: 452284, best q: 0.41356141736891383]
BEST ACTION: 115
Dependencies
numpy
,math
: required for mathematical computations.numba
: required for just-in-time compilation of CUDA kernels (decorated by@cuda.jit
).mctsnc_game_mechanics
: required to define the mechanics of a wanted game or search problem via a set of five device-side functions -is_action_legal
,take_action
,legal_actions_playout
,take_action_playout
,compute_outcome
callable by kernel functions ofMCTSNC
(see mctsnc_game_mechanics module).For usage of
MCTSNC
class, NVIDIA CUDA drivers must be present in the operating system.
Link to project repository
Notes
Private functions of MCTSNC
class are named with a single leading underscore (e.g.: _set_cuda_constants
,
_make_performance_info
, _playout_acp_prodigal
, etc.). Among them, the kernel functions are additionally
described by @cuda.jit
decorators coming from numba
module. Exact specifications of types come along with the decorators.
For public methods full docstrings are provided (with arguments and returns described). For private functions short docstrings are provided.
- class mctsnc.MCTSNC(state_board_shape, state_extra_info_memory, state_max_actions, search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed=0, verbose_debug=False, verbose_info=True, action_index_to_name_function=None)[source]
Bases:
object
Monte Carlo Tree Search implemented via
numba.cuda
meant for multi-threaded executions on GPU involving multiple concurrent trees and playouts (four algorithmic variants available).- VARIANTS = ['ocp_thrifty', 'ocp_prodigal', 'acp_thrifty', 'acp_prodigal']
- DEFAULT_SEARCH_TIME_LIMIT = 5.0
- DEFAULT_SEARCH_STEPS_LIMIT = inf
- DEFAULT_N_TREES = 8
- DEFAULT_N_PLAYOUTS = 128
- DEFAULT_VARIANT = 'acp_prodigal'
- DEFAULT_DEVICE_MEMORY = 2.0
- DEFAULT_UCB_C = 2.0
- DEFAULT_SEED = 0
- DEFAULT_VERBOSE_DEBUG = False
- DEFAULT_VERBOSE_INFO = True
- MAX_STATE_BOARD_SHAPE = (32, 32)
- MAX_STATE_EXTRA_INFO_MEMORY = 4096
- MAX_STATE_MAX_ACTIONS = 512
- MAX_TREE_SIZE = 16777216
- MAX_N_TREES = 512
- MAX_N_PLAYOUTS = 512
- MAX_TREE_DEPTH = 2048
- __init__(state_board_shape, state_extra_info_memory, state_max_actions, search_time_limit=5.0, search_steps_limit=inf, n_trees=8, n_playouts=128, variant='acp_prodigal', device_memory=2.0, ucb_c=2.0, seed=0, verbose_debug=False, verbose_info=True, action_index_to_name_function=None)[source]
Constructor of
MCTSNC
instances.- Args:
- state_board_shape (tuple(int, int)):
shape of board for states in a given game, at most
(32, 32)
.- state_extra_info_memory (int):
number of bytes for extra information on states, at most
4096
.- state_max_actions (int):
maximum branching factor, at most
512
.- search_time_limit (float):
time limit in seconds (computational budget),
np.inf
if no limit, defaults to5.0
.- search_steps_limit (float):
steps limit (computational budget),
np.inf
if no limit, defaults tonp.inf
.- n_trees (int):
number of independent trees, defaults to
8
.- n_playouts (int):
number of independent playouts from an expanded child, must be a power of 2, defaults to
128
.- variant (str):
choice of algorithmic variant from {
"ocp_thrifty"
,"ocp_prodigal"
,"acp_thrifty
,"acp_prodigal
}, defaults to"acp_prodigal"
.- device_memory (float):
GPU memory in GiBs (gibibytes) to be available for this instance, defaults to
2.0
.- ucb_c (float):
value of C constant, influencing exploration tendency, appearing in UCT formula (upper confidence bounds for trees), defaults to
2.0
.- verbose_debug (bool):
debug verbosity flag, if
True
then detailed information about each kernel invocation are printed to console (in each iteration), defaults toFalse
.- verbose_info (bool):
verbosity flag, if
True
then standard information on actions and performance are printed to console (after a full run), defaults toTrue
.- action_index_to_name_function (callable):
pointer to user-provided function converting action indexes to a human-friendly names (e.g.
"e2:e4"
for chess), defaults toNone
.
- _set_cuda_constants()[source]
Investigates (via
numba
module) if CUDA-based computations are available and, if so, sets suitable constants.
- _validate_param(name, ptype, leq, low, geq, high, default)[source]
Validates a parameter - is it of correct type and within given range (either end of the range can be open or closed).
- __str__()[source]
Returns a string representation of this
MCTSNC
instance.- Returns:
str: string representation of this
MCTSNC
instance.
- __repr__()[source]
Returns a detailed string representation of this
MCTSNC
instance.- Returns:
str: detailed string representation of this
MCTSNC
instance.
- init_device_side_arrays()[source]
Allocates all the necessary device arrays based on relevant constants and available memory.
- run(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]
Runs the Monte Carlo Tree Search on GPU involving multiple concurrent trees and playouts. Computations are carried out according to the formerly chosen algorithmic variant, i.e. one of {
"ocp_thrifty"
,"ocp_prodigal"
,"acp_thrifty
,"acp_prodigal
}, defaults to"acp_prodigal"
}.- Args:
- root_board (ndarray):
two-dimensional array with board (or other representation) of root state from which the search starts.
- root_extra_info (ndarray):
any additional information of root state not implied by the contents of the board itself (e.g. possibilities of castling or en-passant captures in chess, the contract in double dummy bridge, etc.), or technical information useful to generate legal actions faster.
- root_turn {-1, 1}:
indicator of the player, minimizing or maximizing, to act first at root state.
- forced_search_steps_limit (int):
steps limit used only when reproducing results of a previous experiment; if less than``np.inf`` then has a priority over the standard computational budget given by
search_time_limit
andsearch_steps_limit
.
- Returns:
- self.best_action (int):
best action resulting from search.
- _flatten_trees_actions_expanded_thrifty(trees_actions_expanded)[source]
Uses information from array
trees_actions_expanded
of shape(self.n_trees, self.state_max_actions + 2)
and converts it to another array where the number of rows corresponds to the total of expanded legal actions in all trees. Each row contains a pair of indexes for: action and tree. The approach allows to allocate exact number of needed CUDA blocks for further operations.
- _make_performance_info()[source]
Prepares and returns a dictionary with information on performance during the last run. After the call, available via
performance_info
attribute.
- _make_actions_info_thrifty()[source]
Prepares and returns a dictionary with information on root actions (using thrifty indexing) implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc. After the call, available via
actions_info
attribute.
- _make_actions_info_prodigal()[source]
Prepares and returns a dictionary with information on root actions (using prodigal indexing) implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc. After the call, available via
actions_info
attribute.
- _run_ocp_thrifty(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]
Runs computations for algorithmic variant:
"ocp_thrifty"
.
- _run_ocp_prodigal(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]
Runs computations for algorithmic variant:
"ocp_prodigal"
.
- _run_acp_thrifty(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]
Runs computations for algorithmic variant:
"acp_thrifty"
.
- _run_acp_prodigal(root_board, root_extra_info, root_turn, forced_search_steps_limit=inf)[source]
Runs computations for algorithmic variant:
"acp_prodigal"
.
- static _reset(root_board, root_extra_info, root_turn, trees, trees_sizes, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos)[source]
CUDA kernel responsible for reseting root nodes of trees to new root state.
- static _select(ucb_c, trees, trees_leaves, trees_ns, trees_ns_wins, trees_nodes_selected, trees_selected_paths)[source]
CUDA kernel responsible for computations of stage: selections.
- static _expand_1_ocp_thrifty(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, random_generators_expand_1, trees_actions_expanded)[source]
CUDA kernel responsible for computations of stage: expansions (substage 1, variant
"ocp_thrifty"
).
- static _expand_1_ocp_prodigal(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, random_generators_expand_1, trees_actions_expanded)[source]
CUDA kernel responsible for computations of stage: expansions (substage 1, variant
"ocp_prodigal"
).
- static _expand_1_acp_thrifty(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded)[source]
CUDA kernel responsible for computations of stage: expansions (substage 1, variant
"acp_thrifty"
).
- static _expand_1_acp_prodigal(max_tree_size, trees, trees_sizes, trees_turns, trees_leaves, trees_terminals, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded)[source]
CUDA kernel responsible for computations of stage: expansions (substage 1, variant
"acp_prodigal"
).
- static _memorize_root_actions_expanded(dev_trees_actions_expanded, dev_root_actions_expanded)[source]
CUDA kernel responsible for memorizing actions expanded at root node(s).
- static _expand_2_thrifty(trees, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded_flat)[source]
CUDA kernel responsible for computations of stage: expansions (substage 2, thrifty number of blocks - variant
"ocp_thrifty"
or"acp_thrifty"
).
- static _expand_2_prodigal(trees, trees_depths, trees_turns, trees_leaves, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded)[source]
CUDA kernel responsible for computations of stage: expansions (substage 2, prodigal number of blocks - variant
"ocp_prodigal"
or"acp_prodigal"
).
- static _playout_ocp(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, random_generators_playout, trees_playout_outcomes)[source]
CUDA kernel responsible for computations of stage: playouts (variant
"ocp_thrifty"
or"ocp_prodigal"
).
- static _playout_acp_thrifty(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, trees_actions_expanded_flat, random_generators_playout, trees_playout_outcomes, trees_playout_outcomes_children)[source]
CUDA kernel responsible for computations of stage: playouts (variant
"acp_thrifty"
).
- static _playout_acp_prodigal(trees, trees_turns, trees_terminals, trees_outcomes, trees_boards, trees_extra_infos, trees_nodes_selected, trees_actions_expanded, random_generators_playout, trees_playout_outcomes, trees_playout_outcomes_children)[source]
CUDA kernel responsible for computations of stage: playouts (variant
"acp_prodigal"
).
- static _backup_ocp(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_selected_paths, trees_actions_expanded, trees_playout_outcomes)[source]
CUDA kernel responsible for computations of stage: backups (variant
"ocp_thrifty"
or"ocp_prodigal"
).
- static _backup_1_acp_thrifty(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_actions_expanded, trees_playout_outcomes, trees_playout_outcomes_children)[source]
CUDA kernel responsible for computations of stage: backups (substage 1, variant
"acp_thrifty"
).
- static _backup_1_acp_prodigal(n_playouts, trees, trees_turns, trees_ns, trees_ns_wins, trees_nodes_selected, trees_actions_expanded, trees_playout_outcomes, trees_playout_outcomes_children)[source]
CUDA kernel responsible for computations of stage: backups (substage 1, variant
"acp_prodigal"
).
- static _backup_2_acp(n_playouts, trees_turns, trees_ns, trees_ns_wins, trees_selected_paths, trees_actions_expanded, trees_playout_outcomes)[source]
CUDA kernel responsible for computations of stage: backups (substage 2, variant
"acp_thrifty"
or"acp_prodigal"
).
- static _reduce_over_trees_thrifty(trees, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, root_actions_expanded, root_turn, root_ns, actions_win_flags, actions_ns, actions_ns_wins)[source]
CUDA kernel responsible for sum-reduction over trees (thrifty number of blocks, variant
ocp_thrifty
oracp_thrifty
).
- static _reduce_over_trees_prodigal(trees, trees_terminals, trees_outcomes, trees_ns, trees_ns_wins, root_actions_expanded, root_turn, root_ns, actions_win_flags, actions_ns, actions_ns_wins)[source]
CUDA kernel responsible for sum-reduction over trees (prodigal number of blocks, variant
ocp_prodigal
oracp_prodigal
).
- static _reduce_over_actions_thrifty(n_root_actions, actions_win_flags, actions_ns, actions_ns_wins, best_action, best_win_flag, best_n, best_n_wins)[source]
CUDA kernel responsible for max/argmax-reduction over actions (thrifty number of blocks, variant
ocp_thrifty
oracp_thrifty
).
- static _reduce_over_actions_prodigal(actions_win_flags, actions_ns, actions_ns_wins, best_action, best_win_flag, best_n, best_n_wins)[source]
CUDA kernel responsible for max/argmax-reduction over actions (prodigal number of blocks, variant
ocp_prodigal
oracp_prodigal
).
- _json_dump(fname)[source]
Dumps (saves) device-side arrays, copied to host, representing trees and MCTS elements from the last run to a text file in json format.
- __module__ = 'mctsnc'