There's a wealth of useful information online about the alpha-beta
search algorithm -
the strictly more efficient and no-less-accurate improvement on
minimax for solving zero-sum,
perfect information games like chess. Despite plenty of material, I've
found most of it lacking in conveying a decent intuition on what the
eponymous values alpha
and beta
actually represent in the algorithm
itself. This is a problem because the many heuristic improvements layered
on top of simple alpha-beta in a strong game-playing program tend to
require a proper grasp on these concepts. This note attempts to record
some useful intuition - both for my own future reference and for you,
dear reader, who have arrived here with the aid of His Majesty's Google
wondering who or what alpha
and beta
really are.
For the sake of completeness, I've also included quite a detailed introduction to minimax and alphabeta. Scroll down to the Appendix if you need to get up to speed on these first.
Alpha and beta
Before I say anything else, it's worth establishing some conventions.
Almost every real implementation of minimax or alpha-beta will use the
"negamax framework" wherein there is a single function which calls itself
recursively and uses minus signs to flip the perspective of the players.
(See Appendix for more on this.) I'll mostly stick to that view here, and
so within the confines of any particular execution of the function
alpha_beta
, we should think of whichever player is moving as seeking to
maximize over the values returned for each of the child moves. Here's a
reference pseudocode implementation. I will refer to the players as Max
and Min in the discussion.
function alpha_beta(alpha, beta, depth_left) { if (depth_left == 0) return evaluate() for (move in legal_moves) { make_move(move) score = -alpha_beta(-beta, -alpha, depth_left - 1) unmake_move(move) if (score >= beta) return beta // fail hard beta-cutoff if (score > alpha) alpha = score // alpha acts like max in minimax } return alpha; }
The first thing to notice is that alpha
plays a role very much akin to
max
in the minimax algorithm. As we iterate through the possibilities,
it tracks what the best move has been so far.
Looking at the diagram above, let's imagine running alpha_beta
at the
root node, which eventually must return the true value, , for the
whole tree.
Let's think through how the values of alpha
and beta
change as the
iteration proceeds over the three possible moves from left to right. I'll
write the values as a tuple for ease. At the top of the
function, this being a root node, we have . First, we
try the leftmost child. This subtree eventually returns us , which
exceeds the current value of alpha
, so we update the range to .
Next, the second move is examined and this ends up returning , which
again beats the current alpha
. We update to .
Now lets consider what happens in the call to alpha_beta
when we drop
down into the third possible move. We will call alpha_beta(-beta, -alpha, depth_left-1)
,
meaning the Min node starts with a range . (Remember that
Min is maximizing over the negations of the real scores shown.) The first move
Min looks at returns (in her negated perspective). And this
immediately causes a "fail hard beta-cutoff" - an early return from
alpha_beta
on account of one of the examined moves exceeding beta
.
Min found a move which was over the top of the whole range she had been
given to work with.
So here's how we can think about beta
's function in the algorithm. The
variable beta
has the meaning:
Beta: somewhere in my ancestor line, my opponent has an opportunity to play a move where they can get
beta
or less. (My opponent is minimizing relative to me). So if I can find a strong move that gives me better thanbeta
in this node, I might as well stop looking. My opponent would never come down this line and give me the opportunity to do that, when they could pick that other move further up instead.
Back to the diagram. Let's now imagine that the first child Min looked at
had value (in her perspective), rather than . In this case, the
beta cutoff wouldn't have happened. Instead, this move would have raised
alpha
to narrow her window to . With no beta cutoff occurring,
she must continue to look at the next child node.
At the child node, Max receives a window of in his perspective.
This is the first time we have seen alpha_beta
called with a window
where . As it happens, in this example Max immediately
gets a beta cutoff from the first move. It returns , which exceeds his
upper bound beta
, and so he wouldn't need to bother looking at the next
move (the one which would return ). This makes sense according to our
interpretation above. Max can say "there is an ancestor node where my
opponent can pick a better option for herself than coming down here, now
that I know this node must be worth at least to me ( to her); in
fact, that ancestor node is my immediate parent - she can pick the first
move which is (for her)". Next, imagine that we had travelled a
couple of nodes down the left-hand branch. The window flips around twice
and so gets back to the same . From this, we can see that beta
really is tracking the best our opponent can do at any node in the
ancestor tree - not just the immediate parent.
So we've looked in a bit of detail at beta-cutoffs and hopefully have a
better understanding of what beta
represents. What about alpha
?
Naturally it's closely related. In fact, because alpha
and beta
keep
swapping roles at each recursive call - when we step down a ply - we can
start to deduce how they are related. First note that alpha
, playing
the role of max
in minimax by keeping track of the best child so far,
is exactly what we want to give to successive child nodes as their
beta
. As discussed above, we need to inform those child nodes of our
best option higher up. In other words, what we think of as alpha
in our
own node will influence beta
at all the odd-plies below us, where it is
our opponent's turn to move.
Alpha: somewhere in my ancestor line, I have an opportunity to play a move where I can get
alpha
. So even if I can't do any better thanalpha
at this node, I still want to make sure that I pass on the value ofalpha
to child nodes in my subtree, because they need to know about it. In particular, nodes at my opponent's turn need to know aboutalpha
because it will be the benchmark for a fail-hard cutoff for them. (In a negamax framework, myalpha
will be theirbeta
, so they'll think of it is a beta-cutoff).
Taking this thinking further, we can actually see that alpha
is
representing the best outcome we are assured of - not just in this
node, but in the entire tree searched so far. Essentially this is how the
alpha_beta
algorithm improves on minimax
. In minimax
, each node
tracks its own best outcome in the variable max
; with alphabeta
, each
node gets full information about the best outcome available for each
player so far in the whole tree.
We can keep going in this vein and think of beta
in similar terms. From
our perspective as the player to move in a given node, beta
represents
the best outcome available for our opponent in the whole tree searched
thus far. From our perspective, beta
is coming down from above, while
our own alpha
is steadily increasing as we discover more favourable
outcomes. Note that this interpretation of alpha
and beta
as a
steadily narrowing window is easiest to imagine in a convention where Max
is maximizing and Min is minimizing over the same set of values (i.e. not
the "negamax framework", where the values keep swapping places and being
negated).
Some vocabularly
There's a clutch of frequently used vocabulary associated with the alpha-beta algorithm which is much easier to understand now we have a more intuitive grasp over the fundamentals. Here's a little glossary:
- Principal Variation - the single thread of moves through the game tree which represents optimal play by both sides. This is effectively the desired output from minimax, alphabeta etc. There is an algorithm called Principal Variation Search (PVS), which attempts to be an even more efficient improvement on alpha-beta, by focusing on the Principal Variation and searching other nodes with a deliberately narrow window, promoting faster cutoffs. A detailed discussion of this algorithm is outside the scope of this note, but there's more on Wikipedia and Chess Programming Wiki.
- Fail-high - an alternative name for a beta-cutoff. Represents a situation where we discover that a node is too good for the player whose turn it is, meaning we can eschew searching further moves. Note that it would be an alpha cutoff for the minimizing player if we weren't in a negamax framework, which is another reason it has a distinct name.
- Fail-low - this refers to the opposite scenario. We sometimes search
through every move at a node and discover that none of them improves
upon
alpha
. This node now fails to be part of the Principal Variation, but we had to do more searching to learn that. Since the value of this node failed to improve uponalpha
, it's less good than some other move we already have elsewhere in the ancestor tree. Dropping down into this node from the parent is very good for our opponent - in fact, we have a better choice elsewhere so there will be a beta-cutoff in our parent node as soon as we return. - Fail-hard - looking at the pseudocode alpha-beta algorithm above, we can see that
if a node fails to improve on
alpha
, we actually returnalpha
rather than the best available move (which was something less thanalpha
). This is known as a "fail-hard framework". - Fail-soft - in fact, you can make a small change to the algorithm
and use a local
max
variable to track the largest value of the node, and return that instead (despite it possibly being less thanalpha
). This has no effect on the overall accuracy or output of the algorithm. A discussion on TalkChess goes over some of the pros and cons. - Node Types - the nodes of a game tree can be categorised into three types based on
their behaviour during execution of the alpha-beta search algorithm.
Here's a quick summary, focusing on intuition, for what each node type
means. Note that these terms can be applied both to the idea of what
the node is expected (or hoped) to be before searching it, and what
it turns out to be after the search returns.
- PV node - nodes which have a score that ends up being inside the window. For example, the leftmost branch of the tree comprises all PV nodes. In fact, the leftmost child of any node is a PV node. Confusingly, the actual Principal Variation returned at the end of the search may not be the same as these. It is merely hoped that the real Principal Variation is searched first because we want to search moves in order of strength to promote cut-offs.
- Cut-node - a cut-node is a node where a beta-cutoff happens. More interestingly, an expected cut-node is one where we hope for a beta cut-off to happen. If we really do manage to achieve good move ordering and search a very strong (PV) node first, all its siblings will be expected cut nodes. We anticipate that these nodes will be able to prune some of their children by virtue of having useful information available from when the PV sibling node was searched first.
- All-node - an all-node is one where all the moves get searched and none of
them improves
alpha
. Its parent is necessarily a cut-node.
Summary
- The alpha-beta window is tracking already-known information about optimal decisions the two players can make in this node, and ancestor nodes higher up the tree. Because we already know things about what they will be inclined to do higher up, we can narrow down the moves that are interesting to us in this subtree.
- Alpha and beta are telling us about the optimal choices that Max and
Min can make higher up the tree. If it's Max's turn, Max can already
do as well as
alpha
somewhere higher up. Min can already do as well asbeta
somewhere higher up. As soon as Max finds a move in this position which is better thanbeta
, we'd be wasting time looking at this node any further. Max would choose something at least as good asbeta
here. But Min, somewhere above this node, would make a choice that is more favourable for her. So who cares how high this node might be. As soon as it's higher thanbeta
, it's definitely not PV. - Although we are maximizing and would like the highest score possible,
beta
constrains us. It says, "don't waste your time looking at crazy good moves here, because your opponent is not going to let you arrive at a node where you have such good options when they can make a more favourable choice higher up." Sobeta
tells us how high it is worth looking at a node before it becomes unreasonably good.
Appendix
The Minimax principle
In a game like chess - I focus on chess throughout, because I know it best, but everything applies to many other games - we each have perfect information about the state of the board and the possible moves available to us. The game is zero-sum: if I win, you lose and vice versa. To work out the best action in a situation like this, you say something like "well, I can go here, here or here. But if I go here, he can checkmate me, and if I go here, he can also checkmate me. But if I go here, he gets three possibilities and no matter which of those he plays... I can checkmate him!" This "I go here, you go there" thinking is formalised in the game theoretic concept called minimax. The basic principle is that one player is attempting to maximize their own score while the other minimizes their own score (which is at all times the negation of the first player's score because of the zero-sumness). A diagram, borrowed from Wikipedia, should make this clear.
In this diagram, we have two players, Max and Min, and a range of numeric values across the leaf nodes at the bottom. We should interpret these as:
- : Max wins
- : Min wins
- : Max has the advantage
- : Min has the advantage
Side note: in theory, a search tree for a game like chess only has three values , representing loss, draw, win for white. Every game must end at a leaf node with one of these three outcomes. In practice, the game tree is far too massive to search exhaustively so we assign real-number scores to nodes when we have gone as deep down the tree as we can. These scores are inexact - a heuristic assessment of who appears to be doing best based on centuries of accumulated chess knowledge.
Imagine that only the bottom row of the diagram is filled in with numbers. We want to work out what Max should do at the very top of the tree. The way to analyse this is to start at the bottom and steadily work up. Starting at the left of the first row up, we see that if this position were reached it would be Min's turn and she would be trying to pick the least bad of and . That is of course , so the node (i.e. position) is already worth because Min, acting rationally in that position, will pick the first move of the two, taking the board to a position worth . Continue in this way along the row. Then move a row up and do the same again, this time from Max's perspective who wishes to pick the maximum available from the alternatives he has available. Working up the tree we eventually get for the initial position, and conclude that Min is winning. Max has two options, both bad, and he should play the second option - the least bad of the two.
Here's a typical implementation of minimax in pseudocode.
function maxi(depth) { if (depth == 0) return evaluate() max = -infinity for (move in legal_moves) { make_move(move) score = mini(depth - 1) unmake_move(move) if (score > max) max = score } return max } function mini(depth) { if (depth == 0) return evaluate() min = +infinity for (move in legal_moves) { make_move(move) score = maxi(depth - 1) unmake_move(move) if (score < min) min = score } return min }
We have two implementations of the same idea, one for the maximizer, one
for the minimizer. The function works recursively. If we reach a search
depth of 0, we statically evaluate the current node and return that.
Otherwise, we iterate through each available move in this position and call
the 'opposite function' on the resulting position. As we get values back
from each child subtree, we maintain a max
variable (or min
in the
mini
case), which tracks the most compelling option so far. At the end, we
return whatever was the value of the best move for the current player.
Everything propagates up the tree in this way and eventually the game is
solved.
Note that in real life, nobody implements the algorithm in this duplicative manner. Instead one can write a single function with a judicious smattering of minus signs to produce the same result. This is obviously cleaner and more DRY, but somewhat harder to parse at first blush.
Improving efficiency with alpha-beta
Minimax works perfectly, and in fact, it returns us an exhaustive knowledge of the best move in every position on the whole game tree. But for games like chess (or indeed anything meatier than tic-tac-toe), the game tree is far too large to actually analyse in this way. Chess has a branching factor of about 35 and an average game length of c. 80 plies, so is a rough estimate for the size of the tree. This is many more nodes than atoms in the universe. Even searching just a few ply down and using heuristic evaluations at leaf nodes presents us with an intractably large computational headache. The first big insight on our path to a strong game-playing engine is the alpha-beta pruning insight which notices that we can significantly reduce the number of nodes we need to search by pruning large subtrees out of the tree at zero cost to overall accuracy.
Once, again, a diagram courtesty of Wikipedia:
Let's again imagine traversing this tree from left to right. We'll descend down the leftmost branch all the way to the leaf . We would then step back up, drop down the next leaf branch to , and conclude the parent node (a Min node) has value . We then step back up and descend down the sibling, to another Min node. We iterate through its children, discovering first a and then a . And this is where we can make a nifty observation. That we just found is the child of a Min node, so the Min node must have a value of at most (for Min would never voluntarily pick something larger than she has to). But wait! The parent of the Min node is a Max node which we already discovered earlier has a child of value . So in that node, Max is guaranteed to pick the or something higher. So combining these two facts, we've already seen enough to know that any further searching of the Min node is pointless wasted time - it can't influence the value of the Max node above! Hence, in the diagram, we see the final child of the Min node (leading down to a ) pruned out. We don't need to search it at all! Try changing that to any other number at all and discover that whatever you set it to, the two plies higher (and indeed the whole tree above that) is unaffected.
So the key to alpha-beta is to be smart about using bounds as we search the tree - we don't typically need to know the exact value of every single node in order to solve the whole tree. Often we can just bound the value of a node and use that information alone at nodes higher up the tree.
A final point to make about the alpha-beta algorithm is the critical importance of move ordering. Pruning happens when we find strong moves in a position early. In fact, the biggest reduction in overall nodes searched vs. minimax will happen when we search moves in order of decreasing strength. Now, clearly, the whole point of the search process is to discover the strength of the moves, so this observation certainly attempts to put the cart before the horse. The practical implication is that it is critical to use heuristics to sort the moves in an estimated order of best to worst before traversing them in the tree. The alpha-beta algorithm becomes more of a proof to efficiently demonstrate that the presumed best move really is good. Most chess-playing programs will spend considerable time on heuristic analyses to guess good moves before searching them. This is time well spent when huge swathes of the search tree are pruned out, saving the algorithm from visiting tens of millions of nodes.
The alpha-beta algorithm
So how you do implement this? Where do our friends alpha
and beta
enter the fray? Here's some pseudocode:
function alpha_beta_max(alpha, beta, depth_left) { if (depth_left == 0) return evaluate() for (move in legal_moves) { make_move(move) score = alpha_beta_min(alpha, beta, depthleft - 1) unmake_move(move) if (score >= beta) return beta // fail-high beta-cutoff if (score > alpha) alpha = score // alpha acts like max in minimax } return alpha } function alpha_beta_min(alpha, beta, depth_left) { if (depth_left == 0) return evaluate() for (move in legal_moves) { make_move(move) score = alpha_beta_max(alpha, beta, depthleft - 1) unmake_move(move) if (score <= alpha) return alpha // alpha-cutoff if (score < beta) beta = score // beta acts like min in minimax } return beta }
One would invoke this at the root node by calling
score = alpha_beta_max(-infinity, infinity, 10)
The algorithm is clearly very similar to the minimax algorithm we
started with. Now, however, we are passing around these alpha
and
beta
variables. Evidently, they are tracking those bounds of interest
and, at certain times, we return early from the function - a
so-called "fail-high cutoff".
Negamax version
For completeness, here's the algorithm modified to the "negamax" form I alluded to above. We don't really want a different function for each perspective when we can achieve the same thing with some minus signs. This approach can be a bit confusing at first, but it's certainly cleaner.
function alpha_beta(alpha, beta, depth_left) { if (depth_left == 0) return evaluate() for (move in legal_moves) { make_move(move) score = -alpha_beta(-beta, -alpha, depthleft - 1) unmake_move(move) if (score >= beta) return beta // fail hard beta-cutoff if (score > alpha) alpha = score // alpha acts like max in minimax } return alpha }
Hopefully by comparing with the previous version, you can see that this
is really the same thing. Everything is now done as if both players are
maximisers. Whenever it is Min's turn, she works with the negation of the
real score, which makes her a maximizer. Slightly less obviously, our
slippery pals alpha
and beta
get swapped back-and-forth in the
recursive call to alpha_beta
. This is because they are being used like
a range or window on the number line, and when we negate everything, the
top end of the range becomes the bottom and vice versa. For the main
discussion about what the variables alpha
and beta
represent, return
to the top.