mcts module
Auxiliary module with a referential standard implementation of MCTS algorithm (for CPU, single-threaded). The module contains:
State
: class representing an arbitrary state of some game or sequential decision problem (meant to inherit from when searches usingMCTS
class are planned);current examples of subclasses are: C4` in c4 module (representation of Connect 4 game),
Gomoku
in gomoku module (representation of Gomoku game).
MCTS
: class representing the referential MCTS algorithm.
Link to project repository
Notes
Private functions of State
and MCTS
classes are named with a single leading underscore.
For public methods full docstrings are provided (with arguments and returns described). For private functions short docstrings are provided.
- class mcts.State(parent=None)[source]
Bases:
object
Arbitrary abstract state of some game or sequential decision problem. Meant to be inherited - extended to subclasses and then applicable in searches conducted by
MCTS
class. For actual inheritance examples see:C4
class in c4 module (representation of Connect 4 game),Gomoku
class in gomoku module (representation of Gomoku game).
When searches using
MCTS
class are planned, the programmer, while inheriting fromState
, must provide implementations for the following non-static methods:take_action_job
,compute_outcome_job
,take_random_action_playout
,__str__
; and one static methodclass_repr
. When searches usingMCTSNC
class are planned, the programmer, while inheriting fromState
, must provide the following non-static methods:get_board
,get_extra_info
and the following static ones:get_board_shape
,get_extra_info_memory
,get_max_actions
.- __init__(parent=None)[source]
Constructor of
State
instances. Should be called (in the first line) in subclasses’ constructors as:super().__init__(parent)
.- Args:
- parent (State):
reference to parent state object.
- __str__()[source]
[To be implemented in subclasses.]
Should return a string representation of this state, e.g., game board with its current contents.
- Returns:
str: string representation of this state.
- static class_repr()[source]
[To be implemented in subclasses.]
Should return a string representation of this class of states (e.g., game name, its variation, board size if configurable, etc.).
- Returns:
str: string representation of this class of states.
- _subtree_size()[source]
Returns size of the subtree rooted by this state (number of tree nodes including this one).
- _subtree_depths(d=0, depths=[])[source]
Returns a list of depths for nodes in the subtree rooted by this state.
- get_turn()[source]
Returns {-1, 1} indicating whose turn it is: -1 for the minimizing player, 1 for the maximizing player.
- Returns:
- self.turn ({-1, 1}):
indicating whose turn it is: -1 for the minimizing player, 1 for the maximizing player.
- take_action(action_index)[source]
Takes action (specified by its index) and returns the child-state implied by the action. If such a child already existed (prior to the call) among children, returns it immediately. Otherwise, creates a new child-state object and tries to call on it the function
take_action_job
, assumed as implemented in the subclass. IfNone
is returned in the latter case (interpreted as illegal action), then forwardsNone
as the result .- Args:
- action_index (int):
index of action to be taken.
- Returns:
- child (State):
reference to child state implied by the action or
None
if action illegal.
- take_action_job(action_index)[source]
[To be implemented in subclasses.]
Should performs changes on this state implied by the given action, and return
True
if the action is legal. Otherwise, should do no changes and returnFalse
.- Args:
- action_index (int):
index of action to be taken.
- Returns:
- action_legal (bool):
boolean flag indicating if the specified action was legal and performed.
- compute_outcome()[source]
If called before on this state, returns the outcome already computed and memorized. Otherwise, tries to call on this state the function
compute_outcome_job
(assumed as implemented in the subclass) and return its result. Possible outcomes for terminal states are {-1, 0, 1}, indicating respectively: a win for the minimizing player, a draw, a win for the maximizing player. For an ongoing game the outcome should beNone
.- Returns:
- outcome ({-1, 0, 1} or
None
): outcome of the game represented by this state.
- outcome ({-1, 0, 1} or
- compute_outcome_job()[source]
[To be implemented in subclasses.]
Should compute and return the game outcome for this state in compliance with rules of the game it represents. Possible results for terminal states are {-1, 0, 1}, indicating respectively: a win for the minimizing player, a draw, a win for the maximizing player. For an ongoing game the result should be
None
.- Returns:
- outcome ({-1, 0, 1} or
None
) game outcome for this state.
- outcome ({-1, 0, 1} or
- get_board()[source]
[To be implemented in subclasses only when a search using
MCTSNC
is planned. Not required forMCTS
searches.]Should return the representation of this state as a two-dimensional array of bytes - in a board-like form (e.g., chessboard, backgammon board, etc.), even if no board naturally exists in the related game (e.g., bridge, Nim, etc.).
- Returns:
- board (ndarray[np.int8, ndim=2]):
two-dimensional array with representation of this state.
- get_extra_info()[source]
[To be implemented in subclasses only when a search using
MCTSNC
is planned. Not required forMCTS
searches.]Should return additional information associated with this state (as a one-dimensional array of bytes) not implied by the contents of the board itself (e.g., possibilities of castling or en-passant captures in chess, the contract in double dummy bridge, etc.), or any technical information useful to generate legal actions faster. If no additional information is needed should return
None
.- Returns:
- extra_info (ndarray[np.int8, ndim=1] or
None
): one-dimensional array with any additional information associated with this state.
- extra_info (ndarray[np.int8, ndim=1] or
- expand()[source]
Expands this state to generate its children by calling
take_action
multiple times for all possible action indexes.
- take_random_action_playout()[source]
[To be implemented in subclasses.]
Should pick a uniformly random action from actions available in this state and return the result of calling
take_action
with the action index as argument.- Returns:
- child (State):
result of
take_action
call for the random action.
- static action_name_to_index(action_name)[source]
[To be optionally implemented by programmer in subclasses.]
Returns an action’s index (numbering from 0) based on its name. E.g., name
"B4"
for 15 x 15 Gomoku maps to index18
.- Args:
- action_name (str):
name of an action.
- Returns:
- action_index (int):
index corresponding to the given name.
- static action_index_to_name(action_index)[source]
[To be optionally implemented by programmer in subclasses.]
Returns an action’s name based on its index (numbering from 0). E.g., index
18
for 15 x 15 Gomoku maps to name"B4"
.- Args:
- action_index (int):
index of an action.
- Returns:
- action_name (str):
name corresponding to the given index.
- static get_board_shape()[source]
[To be implemented in subclasses only when a search using
MCTSNC
is planned. Not required forMCTS
searches.]Returns a tuple with shape of boards for the game (or sequential decision problem) represented by this class.
- Returns:
- shape (tuple(int, int)):
shape of boards related to states of this class.
- static get_extra_info_memory()[source]
[To be implemented in subclasses only when a search using
MCTSNC
is planned. Not required forMCTS
searches.]Returns amount of memory (in bytes) needed to memorize additional information associated with states of the game (or sequential decision problem) represented by this class.
- Returns:
- extra_info_memory (int):
number of bytes required to memorize additional information relate to states of this class.
- static get_max_actions()[source]
[To be implemented in subclasses.]
Returns the maximum number of actions (the largest branching factor) possible in the game represented by this class.
- Returns:
- max_actions (int):
maximum number of actions (the largest branching factor) possible in the game represented by this class.
- __module__ = 'mcts'
- class mcts.MCTS(search_time_limit=5.0, search_steps_limit=inf, vanilla=True, ucb_c=2.0, seed=0, verbose_debug=False, verbose_info=True)[source]
Bases:
object
Monte Carlo Tree Search - standard, referential implementation (for CPU, single-threaded).
- DEFAULT_SEARCH_TIME_LIMIT = 5.0
- DEFAULT_SEARCH_STEPS_LIMIT = inf
- DEFAULT_VANILLA = True
- DEFAULT_UCB_C = 2.0
- DEFAULT_SEED = 0
- DEFAULT_VERBOSE_DEBUG = False
- DEFAULT_VERBOSE_INFO = True
- __init__(search_time_limit=5.0, search_steps_limit=inf, vanilla=True, ucb_c=2.0, seed=0, verbose_debug=False, verbose_info=True)[source]
Constructor of
MCTS
instances.- Args:
- search_time_limit (float):
time limit in seconds (computational budget),
np.inf
if no limit, defaults to5.0
.- search_steps_limit (float):
steps limit (computational budget),
np.inf
if no limit, defaults tonp.inf
.- vanilla (bool):
flag indicating whether information (partial tree, action-value estimates, etc.) from previous searches is ignored, defaults to
True
.- verbose_debug (bool):
debug verbosity flag, if
True
then detailed information about each kernel invocation are printed to console (in each iteration), defaults toFalse
.- verbose_info (bool):
verbosity flag, if
True
then standard information on actions and performance are printed to console (after a full run), defaults toTrue
.
- __str__()[source]
Returns a string representation of this
MCTS
instance.- Returns:
str: string representation of this
MCTS
instance.
- __repr__()[source]
Returns a string representation of this
MCTS
instance (equivalent to__str__
method).- Returns:
str: string representation of this
MCTSNC
instance.
- _make_performance_info()[source]
Prepares and returns a dictionary with information on performance during the last run. After the call, available via
performance_info
attribute.
- _make_actions_info(children, best_action_entry=False)[source]
Prepares and returns a dictionary with information on root actions implied by the last run, in particular: estimates of action values, their UCBs, counts of times actions were taken, etc. After the call, available via
actions_info
attribute.
- _best_action_ucb(children, actions_info)[source]
Returns the best action for selection stage purposes, i.e. the action with the largest UCB value.
- _best_action(root_children, root_actions_info)[source]
Returns the best action among the root actions for the final decision. Actions’ comparison is a three-step process: (1) in the first order, the win flag is decisive (attribute
win_flag
of a child state), (2) if there is a tie (win flags equal), the number of times an action was taken becomes decisive (attributen
of a child state), (3) if there still is a tie (both win flags and action execution counts equal), the number of wins becomes decisive (attributen_wins
of a child state).
- run(root, forced_search_steps_limit=inf)[source]
Runs the standard, referential implementation of Monte Carlo Tree Search (on CPU, single-threaded).
- Args:
- root (State):
root state from which the search starts.
- forced_search_steps_limit (int):
steps limit used only when reproducing results of a previous experiment; if less than``np.inf`` then has a priority over the standard computational budget given by
search_time_limit
andsearch_steps_limit
.
- Returns:
- self.best_action (int):
best action resulting from search.
- _expand(state)[source]
Performs the expansion stage and returns the child (picked on random) on which to carry out the playout.
- _backup(state, playout_root)[source]
Calls
compute_outcome
method on the terminal state (state
), and suitably backs up the outcome to ancestors of the playout root.
- _reduce_over_actions()[source]
Calls
_make_actions_info
and_best_action
using children states of the root to finds the best available action.
- __module__ = 'mcts'