mcts module

Auxiliary module with a referential standard implementation of MCTS algorithm (for CPU, single-threaded). The module contains:

State: class representing an arbitrary state of some game or sequential decision problem (meant to inherit from when searches using MCTS class are planned);
current examples of subclasses are: C4` in c4 module (representation of Connect 4 game), Gomoku in gomoku module (representation of Gomoku game).
MCTS: class representing the referential MCTS algorithm.

Link to project repository

https://github.com/pklesk/mcts_numba_cuda

Notes

Private functions of State and MCTS classes are named with a single leading underscore. For public methods full docstrings are provided (with arguments and returns described). For private functions short docstrings are provided.

class mcts.State(parent=None)[source]

Bases: object

Arbitrary abstract state of some game or sequential decision problem. Meant to be inherited - extended to subclasses and then applicable in searches conducted by MCTS class. For actual inheritance examples see:

C4 class in c4 module (representation of Connect 4 game),
Gomoku class in gomoku module (representation of Gomoku game).

When searches using MCTS class are planned, the programmer, while inheriting from State, must provide implementations for the following non-static methods: take_action_job, compute_outcome_job, take_random_action_playout, __str__; and one static method class_repr. When searches using MCTSNC class are planned, the programmer, while inheriting from State, must provide the following non-static methods: get_board, get_extra_info and the following static ones: get_board_shape, get_extra_info_memory, get_max_actions.

__init__(parent=None)[source]

Constructor of State instances. Should be called (in the first line) in subclasses’ constructors as: super().__init__(parent).

Args:

parent (State):: reference to parent state object.

__str__()[source]

[To be implemented in subclasses.]

Should return a string representation of this state, e.g., game board with its current contents.

Returns:: str: string representation of this state.

static class_repr()[source]

[To be implemented in subclasses.]

Should return a string representation of this class of states (e.g., game name, its variation, board size if configurable, etc.).

Returns:: str: string representation of this class of states.

_subtree_size()[source]: Returns size of the subtree rooted by this state (number of tree nodes including this one).

_subtree_max_depth()[source]: Returns (maximum) depth of the subtree rooted by this state.

_subtree_depths(d=0, depths=[])[source]: Returns a list of depths for nodes in the subtree rooted by this state.

get_turn()[source]

Returns {-1, 1} indicating whose turn it is: -1 for the minimizing player, 1 for the maximizing player.

Returns:

self.turn ({-1, 1}):: indicating whose turn it is: -1 for the minimizing player, 1 for the maximizing player.

take_action(action_index)[source]

Takes action (specified by its index) and returns the child-state implied by the action. If such a child already existed (prior to the call) among children, returns it immediately. Otherwise, creates a new child-state object and tries to call on it the function take_action_job, assumed as implemented in the subclass. If None is returned in the latter case (interpreted as illegal action), then forwards None as the result .

Args:

action_index (int):: index of action to be taken.

Returns:

child (State):: reference to child state implied by the action or None if action illegal.

take_action_job(action_index)[source]

[To be implemented in subclasses.]

Should performs changes on this state implied by the given action, and return True if the action is legal. Otherwise, should do no changes and return False.

Args:

action_index (int):: index of action to be taken.

Returns:

action_legal (bool):: boolean flag indicating if the specified action was legal and performed.

compute_outcome()[source]

If called before on this state, returns the outcome already computed and memorized. Otherwise, tries to call on this state the function compute_outcome_job (assumed as implemented in the subclass) and return its result. Possible outcomes for terminal states are {-1, 0, 1}, indicating respectively: a win for the minimizing player, a draw, a win for the maximizing player. For an ongoing game the outcome should be None.

Returns:

outcome ({-1, 0, 1} or None):: outcome of the game represented by this state.

compute_outcome_job()[source]

[To be implemented in subclasses.]

Should compute and return the game outcome for this state in compliance with rules of the game it represents. Possible results for terminal states are {-1, 0, 1}, indicating respectively: a win for the minimizing player, a draw, a win for the maximizing player. For an ongoing game the result should be None.

Returns:

outcome ({-1, 0, 1} or None): game outcome for this state.

get_board()[source]

[To be implemented in subclasses only when a search using MCTSNC is planned. Not required for MCTS searches.]

Should return the representation of this state as a two-dimensional array of bytes - in a board-like form (e.g., chessboard, backgammon board, etc.), even if no board naturally exists in the related game (e.g., bridge, Nim, etc.).

Returns:

board (ndarray[np.int8, ndim=2]):: two-dimensional array with representation of this state.

get_extra_info()[source]

[To be implemented in subclasses only when a search using MCTSNC is planned. Not required for MCTS searches.]

Should return additional information associated with this state (as a one-dimensional array of bytes) not implied by the contents of the board itself (e.g., possibilities of castling or en-passant captures in chess, the contract in double dummy bridge, etc.), or any technical information useful to generate legal actions faster. If no additional information is needed should return None.

Returns:

extra_info (ndarray[np.int8, ndim=1] or None):: one-dimensional array with any additional information associated with this state.

expand()[source]: Expands this state to generate its children by calling take_action multiple times for all possible action indexes.

take_random_action_playout()[source]

[To be implemented in subclasses.]

Should pick a uniformly random action from actions available in this state and return the result of calling take_action with the action index as argument.

Returns:

child (State):: result of take_action call for the random action.

static action_name_to_index(action_name)[source]

[To be optionally implemented by programmer in subclasses.]

Returns an action’s index (numbering from 0) based on its name. E.g., name "B4" for 15 x 15 Gomoku maps to index 18.

Args:

action_name (str):: name of an action.

Returns:

action_index (int):: index corresponding to the given name.

static action_index_to_name(action_index)[source]

[To be optionally implemented by programmer in subclasses.]

Returns an action’s name based on its index (numbering from 0). E.g., index 18 for 15 x 15 Gomoku maps to name "B4".

Args:

action_index (int):: index of an action.

Returns:

action_name (str):: name corresponding to the given index.

static get_board_shape()[source]

[To be implemented in subclasses only when a search using MCTSNC is planned. Not required for MCTS searches.]

Returns a tuple with shape of boards for the game (or sequential decision problem) represented by this class.

Returns:

shape (tuple(int, int)):: shape of boards related to states of this class.

static get_extra_info_memory()[source]

[To be implemented in subclasses only when a search using MCTSNC is planned. Not required for MCTS searches.]

Returns amount of memory (in bytes) needed to memorize additional information associated with states of the game (or sequential decision problem) represented by this class.

Returns:

extra_info_memory (int):: number of bytes required to memorize additional information relate to states of this class.

static get_max_actions()[source]

[To be implemented in subclasses.]

Returns the maximum number of actions (the largest branching factor) possible in the game represented by this class.

Returns:

max_actions (int):: maximum number of actions (the largest branching factor) possible in the game represented by this class.

__module__ = 'mcts'

class mcts.MCTS(search_time_limit=5.0, search_steps_limit=inf, vanilla=True, ucb_c=2.0, seed=0, verbose_debug=False, verbose_info=True)[source]

Bases: object

Monte Carlo Tree Search - standard, referential implementation (for CPU, single-threaded).

DEFAULT_SEARCH_TIME_LIMIT = 5.0

DEFAULT_SEARCH_STEPS_LIMIT = inf

DEFAULT_VANILLA = True

DEFAULT_UCB_C = 2.0

DEFAULT_SEED = 0

DEFAULT_VERBOSE_DEBUG = False

DEFAULT_VERBOSE_INFO = True

__init__(search_time_limit=5.0, search_steps_limit=inf, vanilla=True, ucb_c=2.0, seed=0, verbose_debug=False, verbose_info=True)[source]

Constructor of MCTS instances.

Args:

search_time_limit (float):: time limit in seconds (computational budget), np.inf if no limit, defaults to 5.0.
search_steps_limit (float):: steps limit (computational budget), np.inf if no limit, defaults to np.inf.
vanilla (bool):: flag indicating whether information (partial tree, action-value estimates, etc.) from previous searches is ignored, defaults to True.
verbose_debug (bool):: debug verbosity flag, if True then detailed information about each kernel invocation are printed to console (in each iteration), defaults to False.
verbose_info (bool):: verbosity flag, if True then standard information on actions and performance are printed to console (after a full run), defaults to True.

__str__()[source]

Returns a string representation of this MCTS instance.

Returns:: str: string representation of this MCTS instance.

__repr__()[source]

Returns a string representation of this MCTS instance (equivalent to __str__ method).

Returns:: str: string representation of this MCTSNC instance.

_make_performance_info()[source]: Prepares and returns a dictionary with information on performance during the last run. After the call, available via performance_info attribute.

_make_actions_info(children, best_action_entry=False)[source]: Prepares and returns a dictionary with information on root actions implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc. After the call, available via actions_info attribute.

_best_action_ucb(children, actions_info)[source]: Returns the best action for selection stage purposes, i.e. the action with the largest UCB value.

_best_action(root_children, root_actions_info)[source]: Returns the best action among the root actions for the final decision. Actions’ comparison is a three-step process: (1) in the first order, the win flag is decisive (attribute win_flag of a child state), (2) if there is a tie (win flags equal), the number of times an action was taken becomes decisive (attribute n of a child state), (3) if there still is a tie (both win flags and action execution counts equal), the number of wins becomes decisive (attribute n_wins of a child state).

run(root, forced_search_steps_limit=inf)[source]

Runs the standard, referential implementation of Monte Carlo Tree Search (on CPU, single-threaded).

Args:

root (State):: root state from which the search starts.
forced_search_steps_limit (int):: steps limit used only when reproducing results of a previous experiment; if less than``np.inf`` then has a priority over the standard computational budget given by search_time_limit and search_steps_limit.

Returns:

self.best_action (int):: best action resulting from search.

_select(state)[source]: Performs the selection stage and returns the selected state.

_expand(state)[source]: Performs the expansion stage and returns the child (picked on random) on which to carry out the playout.

_playout(state)[source]: Performs the playout stage and returns the reached terminal state.

_backup(state, playout_root)[source]: Calls compute_outcome method on the terminal state (state), and suitably backs up the outcome to ancestors of the playout root.

_reduce_over_actions()[source]: Calls _make_actions_info and _best_action using children states of the root to finds the best available action.

__module__ = 'mcts'