Can a Rubik's Cube be brute-forced?

By Robert Smith


Contents


Introduction

When I was about 13, while still a middle-schooler, I became fascinated with the Rubik’s Cube1. I never got terribly good at solving it, maybe eventually getting into the 30 to 40 seconds range. While I didn’t have a penchant for memorizing move sequences, I was drawn into how we find these move sequences.

The story about my interest and exploration in the Rubik’s Cube is for another post. Long story short, I got interested in “computer puzzling”—using computers to manipulate combinatorial puzzles, like the Rubik’s Cube, either to solve them quickly, to discover patterns, or to find novel move sequences for use in speedcubing—and ever since, I’ve been working on different programs for solving Rubik-like puzzles.

Purely in principle, it shouldn’t be hard to solve a Rubik’s Cube with a computer, right? Our program would have three parts:

  1. A model of the Rubik’s Cube, that is, some data structure that represents a cube state.
  2. Some functions which can simulate turns of each side.
  3. A solving procedure which takes a scrambled cube, tries every possible turn sequence, and stops when solved.

Truth be known, and details aside, this is a provably correct method for solving a Rubik’s Cube. If you leave your computer on long enough, it will return a solution.

The problem is that it takes a long time. Probably longer than your lifetime.

Computer puzzling without brute-force

“Brute-force” generally means to try every possibility of something without much of any strategy. Our method above is a brute-force algorithm. Brute-force algorithms generally aren’t practical, because if you have $N$ of something to explore, a brute-force algorithm will take $O(N)$ time. For a Rubik’s Cube, $N$ is 43 quintillion—a very large number.

It has been known, practically since the Rubik’s Cube’s inception, that something else is needed to solve a Rubik’s Cube. Rubik’s Cube solutions, obviously, take into account the specific structure and properties of the cube so as to implicitly or explicitly avoid mindless search. These methods have turned out to be:

  1. Solving methods for humans: memorize some sequences which let you move only a few pieces around in isolation, and apply these sequences mechanically until all pieces are in place. The more sequences you memorize, the faster you’ll be.
  2. Heuristic tree search: do a tree search (with e.g., iterative-deepening depth-first search2), but aggressively prune off branches by way of clever heuristics3.
  3. Phase-based solvers: a deeply mathematical way which involves characterizing the Rubik’s Cube as a sequence of nested (mathematical) subgroups so that each successive coset space small enough that it can be solved by computer.

Computer puzzling mostly deals with the latter two approaches, usually in some combination. Both approaches lead to extraordinarily high-performing solvers. For example:

The story may as well end here. We have slow but optimal ways of solving the Rubik’s Cube, and fast but sub-optimal ways. Pick your poison (sub-optimal or slow), depending on what you’re trying to achieve.

Taking a step back: puzzles as permutations

It seems that any Rubik’s Cube solver has to know something about the structure of the cube. It might be worth asking how little structure we can get away with, so as to make whatever solving algorithm we write generic over a broad class of puzzles.

For a brute-force algorithm with tree search, we would need something like the following:

interface GenericPuzzle:
  type State
  type Move

  function isSolved(State) -> Boolean
  function allMoves() -> List(Move)
  function performMove(State, Move) -> State

With this, we could write the following solver based off of iterative-deepening depth-first search, which is totally generic on the above interface.

function solve(State) -> List(Move)
function solve(p):
  if isSolved(p):
    return []

  for maxDepth from 1 to infinity:
    solved?, solution = dfs(0, maxDepth, p)
    
    if solved?:
      return solution

function dfs(Integer, Integer, State, List(Move)) -> (Boolean, List(Move))
function dfs(depth, maxDepth, p, s):
  if isSolved(p):
    return (True, s)

  if depth == maxDepth:
    return (False, [])

  for m in allMoves():
    p' = performMove(p, m)
    (solved?, solution) = dfs(depth+1, maxDepth, p', append(s, [m])

    if solved?:
      return (solved?, solution)

As discussed before, while this strategy is effective for problems with small search spaces, it’s no help when the space is large. Unfortunately, the GenericPuzzle interface doesn’t give us much room for improvement. Can we still remain generic, while giving us at least a little more room for exploring other algorithms?

The answer is yes, if we restrict ourselves to permutation puzzles. Roughly speaking, a permutation puzzle is one where pieces shift around according to a fixed and always available set of shifting moves. The Rubik’s Cube is a phenomenal and non-trivial example: We can label each mobile4 sticker with a number 1 to 48, and these stickers can always be shifted around with a twist of any of the six sides. Since we can twist any of the six sides at any time, the puzzle is a permutation puzzle. (Not all similar puzzles are permutation puzzles. There are some puzzles which are “bandaged”, that is, pieces of the puzzle are fused together, restricting some available moves depending on the configuration.)

In this view, we looked at a solved configuration as a list of numbers. For example, the solved Rubik’s Cube as a permutation would be

$$ (1, 2, \ldots, 47, 48). $$

When we turn a side, these numbers get permuted. For instance, assuming a particular labeling of stickers with numbers, turning the top face of a Rubik’s Cube might permute the first sticker in the list to the third, the second sticker to the fifth, the third sticker to the eighth, etc. We can use the same notation

$$ (3, 5, 8, 2, 7, 1, \ldots) $$

This notation has two interpretations:

  1. The literal position of numbered stickers on a physical cube (with an agreed upon labeling).
  2. An instruction for how to relabel the stickers of a given cube.

If we look at the notation under the second interpretation, a permutation actually represents a function that’s applied to individual stickers. For instance, if

$$ F := (3, 5, 8, 2, 7, 1, \ldots) $$

then $F(1) = 3$, $F(2) = 5$, etc. All of the clockwise face turns—Front, Right, Up, Back, Left, Down—of a Rubik’s Cube can be described like so:

$$ \begin{align*} F &:= (1, 2, 3, 4, 5, 25, \ldots)\\ R &:= (1, 2, 38, 4, 36, 6, \ldots)\\ U &:= (3, 5, 8, 2, 7, 1, \ldots)\\ B &:= (14, 12, 9, 4, 5, 6, \ldots)\\ L &:= (17, 2, 3, 20, 5, 22, \ldots)\\ D &:= (1, 2, 3, 4, 5, 6, \ldots, 48, 42, 47, 41, 44, 46) \end{align*} $$

We wrote some of the last elements of $D$ because a “down” move doesn’t change the first six stickers in this labeling scheme.

This gives is a whole new interpretation of what it means to “solve” a cube. Given a scrambled cube, we first write down the permutation that describes how the stickers moved from a solved state to the scrambled state. Let’s call it $s$. This is easy, because we can just read the labeled stickers off of a cube one-by-one, in order. For example, $s$ might be:

$$ s := (27, 42, 30, 15, 39, 6, \ldots). $$

This is a description of a function! The value of $s(1)$ describes how the first sticker of a cube will be shifted to its scrambled position, in this case $27$. Next, solving a cube is finding a sequence of $k$ moves $m_1, m_2, \ldots, m_k$ such that, for all $1\leq i\leq 48$,

$$ i = m_k(m_{k-1}(\cdots(m_2(m_1(s(i)))))). $$

Stated another way in function composition notation, the function

$$ m_k \circ m_{k-1} \circ \cdots \circ m_2 \circ m_1\circ s $$

must be the identity function—a permutation that doesn’t move anything.

In the permutation puzzle way of thinking, we can still implement our GenericPuzzle interface:

This might even be more efficient than another choice of representation, since permutations can be represented very efficiently on a computer as packed arrays of bytes!

But we didn’t do all this mathematical groundwork just to goof around; there’s something amazing lurking in these permutations.

Brute-force, still ignorant, but kinda smart?

In the late 1980s, Adi Shamir5 and his students made a brilliant series of observations that came together to make for a beautiful result. Unfortunately, to my knowledge, only two writings exist on the topic.

  1. Shamir and his colleagues wrote a paper about it [1], sort of in the style of a brief conference proceeding, but it’s very light on details and skips implementation considerations. It’s the kind of paper where you follow it, but you have to fill in a great number of blanks to make anything from it work.

  2. Shamir gave a talk sometime in the 80’s about his result, and somebody (none other than Alan Bawden) wrote a brief email [2] to a mailing list about his recollection of it.

An amazing result, buried in history, without any good exposition that I could find.

What’s the result? The essence of the result is this. Reminiscent of a “meet in the middle” algorithm, if we want to brute-force a problem that ordinarily requires visiting $N$ states to find an answer, we can instead cleverly split the work into two searches that requires visits to around $\sqrt{N}$ states. For a Rubik’s Cube, that cuts work associated with 43 quintillion states, down to work associated with 6 billion states. The best part is, this is still brute-force; virtually no knowledge of the structure of the problem is required to make it work.

Let’s walk through the requisite steps and build up to the result. I’ll attempt to write in a general framework (since it’s a general algorithm), but make frequent appeals to the Rubik’s Cube specifically.

Observation #1: decomposition as intersection

Suppose the following:

The goal is to find precisely $\hat x$ and $\hat y$ are. The simplest way to do this is to check every combination of elements in $X$ and $Y$.

for x in X:
  for y in Y:
    when s = compose(y, x):
      return (x, y)

This will take time proportional to the product of the set sizes: $O(\vert X\vert\cdot\vert Y\vert)$. Shamir noticed the following: If $s=\hat y\circ\hat x$, then $\hat y^{-1}\circ s = \hat x$. With this, we preprocess our $Y$ set to be instead

$$ Y' := \{y^{-1}\circ s : y\in Y\}. $$

By doing this, there must be an element in common between $X$ and $Y'$, since $\hat x\in X$ and $\hat y^{-1}\circ s\in Y'$ and those are equal. So we’ve reduced the problem to determining what the intersection between $X$ and $Y'$ is.

Once we find our $z$ which is in common with $X$ and $Y'$, then our recovered permutation will be $\hat x = z$ and $\hat y = (z\circ s^{-1})^{-1}$.

We’ve just established that the problem of decomposing an element like $s$ is identical to the problem of calculating a set intersection. Still, if we want to do the intersection, our intuition tells us we still need a quadratic algorithm, which brings us to the second observation.

Observation #2: sorting really helps!

Permutations have a natural ordering, called lexicographic ordering. If you have two permutations, and you read their elements left-to-right, you can compare them like ordinary numbers. Just as $123 < 213$, we can say that

$$ (1,2,3) < (2,1,3). $$

A nice property of this is that the identity permutation $(1, 2, 3, \ldots)$ is the smallest permutation of a given size.

How does this help us? Well, suppose we sort our sets $X$ and $Y'$ into lists $L_X$ and $L_{Y'}$, so the permutations are in order. If $L_X$ and $L_{Y'}$ have an element in common, we can find it in linear time: $O(\min\{\vert X\vert, \vert Y'\vert\})$. How? Something like the following:

function findCommon(Lx, Ly):
  x = pop(Lx)
  y = pop(Ly)
  loop:
    if x == y:
      return x
    
    if empty(Lx) or empty(Ly):
      error("No common elements found.")

    if x < y:
      x = pop(Lx)
    else if x > y:
      y = pop(Ly)

This works because we are essentially looking at all of the elements of $L_X$ and $L_{Y'}$ together in sorted order. It’s like a merge sort, without the merge part.

As written, findCommon computes just one element of the intersection. Instead of returning, the loop could continue to enumerate all elements. This is useful to know for the purpose of solving permutation puzzles: Do we want just some solution, or do we want all solutions? That answer, of course, depends on the application.

Before continuing, we should take a little scenic tour on a more formal meaning of “moves” and “move sequences”, since ultimately any permutation puzzle solving algorithm must produce them as output.

What is a move?

A quick bit about notation. If we have a permutation $f$, then its inverse is written $f^{-1}$, and it’s $k$-fold repetition $f\circ f\circ\cdots\circ f$ is written $f^k$. If we have a collection of permutations $S := \{f_1, f_2, \ldots\}$, then we write the following shorthands:

$$ \begin{align*} S^{-1} &:= \{f^{-1} : f \in S\}\\ S^{\times k} &:= \{f^k : f \in S\}. \end{align*} $$

If $g$ is some permutation, we also write these shorthands:

$$ \begin{align*} g\circ S &:= \{g\circ f : f \in S\}\\ S\circ g &:= \{f\circ g : f \in S\}. \end{align*} $$

Similarly, if $T := \{g_1, g_2, \ldots\}$, then we can write

$$ \begin{align*} S\circ T &:= \{f\circ g : f\in S, g\in T\}\\ &= \{f_1\circ g_1, f_2\circ g_1, \ldots, f_1\circ g_2, \ldots\}. \end{align*} $$

With that out of the way, let’s talk about the concept of a single “move”. What counts as a “move” in a permutation puzzle?

Really, we can choose any set of moves we please, so long as every state of the puzzle is reachable through some combination of the moves. For example, let

$$ C := \{F, R, U, B, L, D\}, $$

the basic and well understood ninety-degree clockwise moves of the Rubik’s Cube. Indeed, $C$ itself is a fine definition of available moves. All of the following are also valid definitions of moves:

$$ C\cup C^{-1},\quad C\cup C^{\times 2},\quad C^{-1},\quad C\cup C^{\times 2}\cup C^{-1}, $$

and so on. Perhaps surprisingly, we can take any element of $C$ and remove it, and it would still be a valid set of moves for the Rubik’s Cube6!

Which set of moves we select usually has little relevance mathematically (they are all expressible as one another), but has great relevance when we are synthesizing efficient move sequences, or when we want to talk about “optimality”. For instance, consider a counterclockwise move: $F^{-1}$. It’s natural to consider this a single move, but if we consider our set to be $C$, then we’d have to count it as three moves, since $F^{-1} = F\circ F\circ F = F^3$. What about $F^2$? Is that one move or two? Speedcubers generally consider $F^2$ to be one motion, so counting that as one move is natural, but many computer puzzlers like the simplicity of $C\cup C^{-1}$, i.e., only ninety-degree turns7.

For the rest of this note, we’ll be in the former camp, where half-turns count as one, and we’ll denote this set of moves as:

$$ \bar C := C \cup C^{-1} \cup C^{\times 2}. $$

What is a word?

After we agree on what we consider a move, we can be more specific as to what we mean about move sequences. A move sequence is a possibly empty list of moves. A move sequence can be composed to form the permutation it represents. This composition operator is called $\kappa$, and is easily defined. Let $M$ be a move set, and let $s = [s_1, s_2, \ldots, s_n]$ be a sequence of $n$ moves with each $s_{\bullet}$ a move from $M$. The length of $s$ is naturally $n$, and its composition is defined as:

$$ \begin{align*} \kappa([\,]) &:= (1, 2, 3, \ldots)\\ \kappa([s_1, s_2, \ldots, s_{n-1}, s_n]) &:= \kappa([s_1, s_2, \ldots, s_{n-1}])\circ s_n. \end{align*} $$

If $M$ is a move set, then the set of all move sequences (including the empty sequence) is denoted $M^{*}$, a notation kindly borrowed from formal language theory.

If we identify the elements of $M$ with symbols, then a move sequence is called a word. We’ll always type symbols in $\texttt{typewriter}$ font. The moves $\{F, R, U, B, L, D\}$ have the symbols $\{\texttt{F}, \texttt{R}, \texttt{U}, \texttt{B}, \texttt{L}, \texttt{D}\}$, an inverse $F^{-1}$ has the symbol $\texttt{F'}$, and a square $F^2$ has the symbol $\texttt{F2}$. And we type words as symbols joined together in reverse order8, so $[R^{-1}, U^2, L]$ can be represented by the word $\texttt{L U2 R'}$.

The distinction is subtle but important. In a computer program, a move sequence is a list of permutations, while a word is a list of symbols. A Rubik’s Cube solving program should take as input a permutation, and output a word which when composed as a move sequence, brings that permutation to identity.

When doing math, we often mix up all of these concepts since they have little bearing on the correctness of an argument. Whether it’s the permutation $F\circ R^{-1}$ or the move sequence $[F, R^{-1}]$ or the word $\texttt{R' F}$ or otherwise, they all represent roughly the same thing, but computers need to be explicit about which representation is being manipulated.

So, in summary:

Back to this brute-force thing…

Observation #3: sorting as solving

As silly as the example is, let’s suppose we know, for a fact, that a Rubik’s Cube was mixed up using only six moves from $\bar C$. Since $\bar C$ has 18 elements, without any optimization, we might have to try $18^6$ move sequences to find a solution.

Instead of brute-forcing in that way, we can do another trick. Let s be our scrambled permutation.

  1. Write out every combination of 3 moves into a table. The key would be the permutation, and the value would be the word associated with that permutation. Call this table A.

  2. Sort A in ascending lexicographic order on the permutation.

  3. Make a copy of A, call it B. For all (perm, word) in B, reassign perm := compose(invert(perm), s). We do this because of Observation #1.

  4. Sort B.

  5. Call x := findCommon(A, B). We do this via Observation #2.

  6. Reconstruct a word equal to s by A[x].word ++ reverse(B[x].word). We do this to recover a final result via Observation #1.

Since we have a word that brings us from solved to s, we can invert the word to bring us from s to solved.

By this method, we avoided visiting all $16^6$ move sequences by instead pre-calculating two groups of $16^3$ sequences and exploring them for an intersection. We have cut the amount of work down to its square root.

If we generalize to length $n+m$ (for some splitting of $n$ and $m$), then we can replace the work of visiting $16^{n+m}$ states with $16^m + 16^n$ states, which is much better.

So we’re done? We now know that the Rubik’s Cube requires no more than 20 moves, so if we make two tables enumerating 10 moves, we should be good?

Well, err, $16^{10} = 1,099,511,627,776$. Unless we have trillions of resources to space, be it time or space, it’s still not going to work.

More splitting?

An enterprising computer science student, at this point, might smell recursion. If we split once, can we split again? If we know a Rubik’s Cube can be solved in 20 moves, can we split it into two 10 move problems, and each of those into two 5 move problems?

The problem with this is that at the top layer of recursion, it’s clear what we are solving. At lower layers, it’s no longer clear. What actually is the recursive structure at play? And if we could do this trick, couldn’t we decimate any brute-force problem of exponential complexity (e.g., in number of moves) into one of linear?

That isn’t going to work, but we can be inspired by it. Let $L$ be the set of 5-or-fewer-move combinations from $\bar C$, that is,

$$ L := \bigcup_{i=0}^5 \bar C^i. $$

The size of $L$ is going to be $621,649$ if we don’t store redundant permutations. This is definitely possible to compute. Then our goal is to find a decomposition of $s$ in terms of an element in $L\circ L\circ L\circ L$. Using the same trick from Observation #1, suppose there is a decomposition $$s = l_4\circ l_3\circ l_2\circ l_1.$$ Then $$l_3^{-1}\circ l_4^{-1} \circ s = l_2\circ l_1.$$ So we create four tables:

No, the $4$ before $3$ is not a typo! We put this in order to save on computation and avoid redundant work. Now our goal is to find an element in common between the two sets

$$ \begin{align*} X &= L_2 \circ L_1\\ Y &= L_4 \circ L_3. \end{align*} $$

Somehow, we must do this without actually calculating all elements of $L_i\circ L_j$. And, to add insult to injury, for findCommon to work, we need to be able to go through the set in sorted order.

Iterating through products with Schroeppel–Shamir

Suppose we have two lists of positive numbers $A$ and $B$. How can we print the elements of $\{a+b : a\in A, b\in B\}$ in numerical order without explicitly constructing and sorting this set? Shamir and his collaborator Schroeppel did so with the following algorithm.

  1. Sort $A$ in ascending order. Pop off the first (and therefore smallest) element $a_1$.

  2. Create a priority queue $Q$ and initialize it with $(a,b)$ with priority $a_1 + b$ for all $b\in B$.

  3. Repeat the following until $Q$ is empty:

    1. Pop $(a,b)$ off $Q$. This will form the next smallest sum, so print $a+b$.
    2. Find $a'$ which immediately succeeds $a$ in our sorted list $A$.
    3. Push $(a',b)$ with priority $a+b$ onto $Q$.

This algorithm will terminate, having printed each sum successively with at most $O(\vert A\vert + \vert B\vert)$ space and almost linear time. (The sorting and priority queue maintenance require some logarithmic factors.)

With a little work, one can see why this works. In a sense it’s a two-dimensional sorting problem, that depends on one crucial fact: If $x \le y$ then $x+z \le y+z$. (This is to say that addition is monotonic.) Given how the priority queue is constructed, it will always contain the smallest sum.

Could we do this with permutations? If we have two lists of permutations $A$ and $B$, and $a_1$ is the “smallest” (i.e., lexicographically least) permutation of $A$, and $b_1$ is the “smallest” permutation of $B$, then it is patently not true that $a_1\circ b_1$ is the smallest element of $A\circ B$. In symbols,

$$ (\min A) \circ (\min B) \neq \min (A\circ B). $$

Similarly, if two permutations satisfy $a < b$, then it is patently not true that

$$ a\circ z < b\circ z $$

for a permutation $z$.

The monotonicity of addition is what allows us to do steps 3.2 and 3.3 so easily. If we did the same with permutations, we would no longer have the guarantee that the minimum composition exists within the queue.

This was the next hurdle Shamir cleared. Constant in the size of $A$ or $B$, Shamir found a way to solve the following problem: Given a permutation $a\in A$ and $b\in B$, find the element $b'\in B$ such that $a\circ b'$ immediately succeeds $a\circ b$. In other words, we can generate, one-by-one, a sequence of $b$’s needed for step 3.2 and 3.3. With this algorithm (which we’ll describe in the next section), our Shamir–Schroeppel algorithm for permutations becomes the following:

Algorithm (Walk Products):

  1. Initialize an empty priority queue $Q$ whose elements are pairs of permutations with priority determined by another permutation in lexicographic ordering.
  2. For each permutation $b\in B$:
    1. With Shamir’s trick, find the $a\in A$ such that $a\circ b = \min (A\circ b)$.
    2. Push $(a, b)$ onto $Q$ with priority $a\circ b$.
  1. Repeat the following until $Q$ is empty:
    1. Pop $(a,b)$ off $Q$. This will form the next smallest $a\circ b$, so print it9.
    2. With Shamir’s trick, find $a'$ such that $a'\circ b$ immediately succeeds $a\circ b$.
    3. Push $(a',b)$ with priority $a'\circ b$ onto $Q$.

This algorithm will produce the elements of $A\circ B$, one-by-one in lexicographic order.

What is Shamir’s trick? We need a data structure and a clever observation.

Permutation tries

In order to handle sets of ordered permutations better, Shamir created a data structure. I call it a permutation trie. A permutation trie of size-$k$ permutations is a $k$-deep, $k$-ary tree, such that a path from root-to-leaf follows the elements of a permutation. The leaf contains data which we want to associate with the permutation.

For example, consider permutations of size $5$. Suppose we wanted to associate the symbol $\texttt{p6}$ with the permutation $(2,4,1,3,5)$. Then we would have a $5$-layer tree with a root node $R$, such that $R[2][4][1][3][5] = \texttt{p6}$.

More generally, let’s associate the following symbols with the following permutations in a permutation trie $R$:

$$ \begin{align*} \texttt{p1} &\leftarrow (1,2,3,4,5) & \texttt{p2} &\leftarrow (1,2,3,5,4) & \texttt{p3} &\leftarrow (1,2,4,3,5)\\ \texttt{p4} &\leftarrow (1,2,5,3,4) & \texttt{p5} &\leftarrow (1,3,4,5,2) & \texttt{p6} &\leftarrow (2,4,1,3,5)\\ \texttt{p7} &\leftarrow (4,1,3,2,5) & \texttt{p8} &\leftarrow (4,1,3,5,2) & \texttt{p9} &\leftarrow (5,1,2,3,4)\\ \end{align*} $$

The trie would be a data structure that looks like this:

An example permutatioen trie.

Even though we don’t show them, conceptually, each node in the trie has a full length-$5$ array, with some elements empty (i.e., there are no children).

What’s good about this data structure? First and foremost, pre-order traversal will visit the permutations in lexicographic order. We can use this data structure to store two things at the leaves (i.e., $\texttt{p}n$):

  1. The actual permutation data structure representing that path, and
  2. The word we used to construct that permutation.

This is the data structure, and now we get to Shamir’s insight. Suppose we have a permutation $s$ and a permutation trie $R$ (which represents a set of permutations), and we want to traverse $s\circ R$ in lexicographic order. The naive way is to construct a new trie, but we wish to avoid that. To explain the idea, we’ll choose a concrete example.

Let’s use $R$ from above. Let $s := (3,1,4,2,5)$. (Note that $s\not\in R$, but that’s not important.) We wish to find an $r'\in R$ such that $s\circ r' = \min (s\circ R)$. Well, the smallest permutation would be one such that $r'(1) = 2$, because then $s(r'(1)) = s(2) = 1$. Looking at our trie $R$, we can see the only candidate is that associated with $\texttt{p6}$: $(2,4,1,3,5)$, which is the minimum.

What about the next smallest $s\circ r''$? For ease, let’s call this product $m$. We would want a permutatation such that $r''(1) = 4$, because $m(1) = s(r''(1)) = s(1) = 2$. This time, there are two candidates:

$$ (4,1,3,2,5)\qquad (4,1,3,5,2) $$

So at least we know $m = (2, \ldots)$. To disambiguate, we need to look at $r''(2)$. These are the same, likewise $r''(3)$, so we have no degree of freedom at $2$ or $3$ to minimize the product. Thus $m = (2, 3, 4, \ldots)$. We have a choice at $r''(4)$, however. The best choice is $r''(4) = 2$, because $m(4) = s(r''(4)) = s(2) = 1$, the smallest possible choice. This disambiguates our choice of $r''$ to be $(4,1,3,2,5)$ so that $m = (2,3,4,1,5)$.

We could repeat the procedure to find the next smallest product $s\circ r'''$. What exactly is the procedure here? Well, we walked down the tree $R$, but instead of walking down it straight, we instead did so in a permuted order based on $s$—specifically $s^{-1}$. Consider our normal algorithm for walking the tree10 in lexicographic order:

function walkLex(R):
  if notTree(R):
    print R
  else:
    for i from 1 to length(R):
      if R[i] exists:
        walkLex(R[i])

We can instead walk in permuted order, so that we produce a sequence $[r, r'', r''', \ldots]$ such that

$$ s\circ r < s \circ r' < s \circ r''' < \cdots, $$

we modify our walking algorithm as so:

function walkProductLex(R, s):
  walk'(R, inverse(s))

function walk'(R, s):
  if notTree(R):
    print R
  else:
    for i from 1 to length(R):
      j = s(i)
      if R[j] exists:
        walk'(R[j], s)

Note that $s$ was inverted before the recursion to make quick permuting of each node.

With this, we have the remarkable ability to iterate through products in lexicographic order, without having to enumerate them all and sort them. This was the last and critical ingredient.

The 4-List Algorithm and solving the Rubik’s Cube

Now we want to put this all together to create the 4-List Algorithm. Let’s restate the problem in clear terms.

Problem (4-List): Let $s$ be a permutation. Let $L_1$, $L_2$, $L_3$, and $L_4$ be sets of permutations such that we know $s\in L_4\circ L_3\circ L_2\circ L_1$. Find $l_1\in L_1$, $l_2\in L_2$, $l_3\in L_3$, and $l_4\in L_4$ such that $s = l_4\circ l_3\circ l_2\circ l_1$.

Piecing together the elements above, we arrive at the 4-List Algorithm.

Algorithm (4-List):

  1. Construct $L'_3 := L_3^{-1}\circ s$ and $L'_4 := L_4^{-1}$.
  2. Create two generators11: $X_1$ that walks $L_2\circ L_1$ in lexicographic order, and $X_2$ that walks $L'_3\circ L'_4$ in lexicographic order. Do this by using the Walk Products algorithm, which itself is implemented by constructing permutation tries and using walkProductLex.
  3. Call findCommon on $X_2$ and $X_1$. This is guaranteed to find a solution $(l_3^{-1},l_4^{-1}\circ s,l_2,l_1)$. Process the solution to return $(l_4, l_3, l_2, l_1)$.

The main difficulty of this algorithm, aside from implementing each subroutine correctly, is plumbing the right data around.

Now, we can use this to solve a scrambled Rubik’s Cube $s$.

Algorithm (Solve Cube):

  1. Let $L := \bigcup_{i=0}^5\bar C^i$, keeping a record of the words used to construct each element of $L$. (We recommend immediately making a permutation trie, where the leaves store the words.)
  2. Apply the 4-List Algorithm to the problem $s \in L\circ L\circ L\circ L$ to emit $(l_4, l_3, l_2, l_1)$.
  3. Return words $(w_4, w_3, w_2, w_1)$ associated with the permutations $(l_4, l_3, l_2, l_1)$.

The algorithm may terminate upon finding just the first solution, or may be continuously run to find all solutions. If we do the latter, we guarantee finding all optimal solutions.

Amazingly, this algorithm really works, and answers our blog post question in the affirmative: yes, the Rubik’s Cube can be brute-forced.

Example and source code

This algorithm is implemented in Common Lisp, in my computational group theory package CL-PERMUTATION. CL-PERMUTATION already has built in support for Rubik’s Cubes as permutation groups. Starting a new Common Lisp session, we have the following:

> (ql:quickload '(:cl-permutation :cl-permutation-examples))
> (in-package :cl-permutation)
> (group-order (perm-examples:make-rubik-3x3))
43252003274489856000
> (format t "~R" *)
forty-three quintillion two hundred fifty-two quadrillion three trillion
two hundred seventy-four billion four hundred eighty-nine million
eight hundred fifty-six thousand
NIL

The built-in Rubik’s Cube model only uses $\{F, R, U, B, L, D\}$, so we make new generators corresponding to $\bar C$.

> (defvar *c (loop :with cube := (perm-examples:make-rubik-3x3)
                   :for g :in (perm-group.generators cube)
                   :collect (perm-expt g 1)
                   :collect (perm-expt g 2)
                   :collect (perm-expt g 3)))
*C
> (length *c)
18

Now we construct $\bar C^5 \cup \bar C^4 \cup \cdots \cup \bar C^0$.

> (defvar *c5 (generate-words-of-bounded-length *c 5))
*C5
> (perm-tree-num-elements *c5)
621649

Note that this constructs a perm-tree object, which automatically stores the words associated with each permutation generated.

Now let’s generate a random element of the cube group.

> (defvar *s (random-group-element (perm-examples:make-rubik-3x3)))
*S
> *s
#<PERM 43 44 41 20 47 11 28 9 24 13 17 42 36 40 37 25 6 21 1 29 7 19 10 3 35 39 22 18 34 33 31 48 16 15 30 2 23 32 26 46 8 4 27 12 45 14 5 38>

Lastly, we run the 4-list algorithm and wait.

> (decompose-by-4-list *s *c5 *c5 *c5 *c5 :verbose t)
10,000,000: 52 sec @ 192,553 perms/sec; .0013% complete, eta 1114 hours 58 minutes
20,000,000: 48 sec @ 206,858 perms/sec; .0026% complete, eta 1037 hours 51 minutes
Evaluation took:
  145.094 seconds of real time
  145.097120 seconds of total run time (144.961382 user, 0.135738 system)
  [ Run times consist of 2.405 seconds GC time, and 142.693 seconds non-GC time. ]
  100.00% CPU
  421,375,385,955 processor cycles
  11,681,934,352 bytes consed

((8 11 14 2 4)
 (1 16 9 15 1)
 (7 6 18 8 15)
 (9 13 16 15 8))

We are pretty lucky this one ended in a mere 2 minutes 25 seconds! It usually isn’t so prompt with an answer.

The results are printed as four words: our $l_4$, $l_3$, $l_2$, and $l_1$. Each integer $n$ represents the 1-indexed $n$th permutation of $\bar C$ (ordered by how it was constructed). We can create a more traditional notation:

> (defvar *solution (reduce #'append *))
*SOLUTION
> (defun notation (ws)
    (dolist (w (reverse ws))
      (multiple-value-bind (move order)
          (floor (1- w) 3)
        (format t "~C~[~;2~;'~] "
                (aref "FRUBLD" move)
                order))))
NOTATION
> (notation *solution)
U2 L' D L U' L' U2 D' R' U F L' U' D F R F2 L2 B2 U2

How do we know if this is correct? We need to check that the composition of this word equals our random element, which we do by composing the word (using something CL-PERMUTATION calls a “free-group homomorphism”), inverting the permutation, and composing it with our scramble to see that it brings us to an identity permutation.

> (defvar *hom (free-group->perm-group-homomorphism
                (make-free-group 18)
                (generate-perm-group *c)))
*HOM
> (perm-compose (perm-inverse (funcall *hom *solution)) *s)
#<PERM 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48>

Indeed, we found a reconstruction of our cube.

Tips for optimizing the 4-List Algorithm

One of the most troubling aspects of implementing this algorithm is making it fast enough. My initial implementation worked at a whopping 200 permutations per second. That’s incredibly slow, and meant that it would take well over a century (in the worst case) for my program to finish. Now, it works at about 190,000 permutations per second, with an estimated worst-case search time of 2 months. (I haven’t encountered a scrambled cube position which has taken more than 10 hours.)

Here are some ways I sped things up.

  1. Be economical with memory. When doing exploratory programming, it’s desirable to tag and store everything, but each of those storages and accesses take time.
  2. Don’t use actual arrays in the permutation trie. When I did that, I ran out of memory. I instead opted for a sparse representation using an “a-list” (that is, a linked list of (index, value) pairs).
  3. Make the permutation handling fast, like composition, equality testing, and lexicographic ordering. I was originally using generic arithmetic and 64-bits to represent each permutation element, and it degraded speed.
  4. Use a good priority queue implementation. You’ll be pushing and popping hundreds of millions of elements.
  5. Do some analysis and compress the permutation trie representation. Most nodes of the trie will only contain one value. If that’s the case, just store instead the permutation (and whatever value associated with it) at the shallowest depth. This will save a lot of time by avoiding a lot of needless (permuted) recursion.

If you have other tips for speeding up the algorithm, please email me!

Sample benchmarks

In the following, we only consider the problem of solving the Rubik’s Cube using the 4-list algorithm, assuming a solution length of 20 moves.

My computer is a ThinkPad 25th Anniversary Edition. It has an Intel Core i7-7500U processor at 2.70 GHz, but boosting to 3.50 GHz. It has 32 GiB of RAM, but comfortably runs the solver with around 3–4 GiB.

The algorithm as implemented is able to check around 190,000 elements per second.

Generating the move lists and pre-processing is a relatively fixed cost. The lists can be generated once, but the preprocessing (i.e., composing the scramble with one of the lists) needs to happen each solve. In my implementation, the initialization cost is consistently 9 seconds.

After initialization, the search is conducted. The run time varies wildly, anywhere from seconds to hours.

These are randomly sampled Rubik’s Cube scrambles, sorted by time.

In principle, with the current level of optimization, the algorithm can take as much as 2 months to finish. I’m confident that my implementation can be brought down a factor of 2, less confident it can be easily brought down a factor of 50—but it wouldn’t surprise me either way.

One interesting thing about this algorithm is that it seems to return very, very quickly if the solution is 10 or fewer moves. Why? I haven’t done a careful analysis, but I believe it is essentially because the solution will be in $L_2\circ L_1$. The permutations $l_3$ and $l_4$ will be identity, which reduces to the problem of just finding $s\in L_2\circ L_1$.

Conclusion

“Meet in the middle” algorithms are old and well understood. When we can’t brute-force an entire space, we can try splitting it in two and try to combine them. That’s of course the spirit of the 4-List Algorithm, but the devil is always in the details, and I hope this blog post showed a lot of disparate facts needed to come together to realize the algorithm.

I think the algorithm communicated by Shamir and his colleagues has been remarkable but forgotten. While better algorithms exist for the specific task of solving the Rubik’s Cube, the generality of the 4-List Algorithm ought not be understated.

References

  1. A. Fiat, S. Moses, A. Shamir, I. Shimshoni and G. Tardos, “Planning and learning in permutation groups,” 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, NC, USA, 1989, pp. 274–279, doi: 10.1109/SFCS.1989.63490. (Link)
  2. A. Bawden. “Shamir’s talk really was about how to solve the cube!”. Alan Bawden. From the Cube Lovers mailing list. 27 May 1987. (Link)

  1. The Rubik’s Cube? Why not just “Rubik’s Cube”?! ↩︎

  2. Iterative-deepening depth-first search (IDDFS) is an interesting hybrid between breadth-first and depth-first search. Breadth-first search (BFS) can find an optimal path to a target, but requires lots of memory to keep track of nodes that have been seen. Depth-first search (DFS) uses almost no memory, but can’t guarantee finding the shortest path. IDDFS is an algorithm which tries DFS up to a maximum depth of 1, then of 2, then of 3, etc. until a path to the target is found. While we re-visit nodes in each successive increase in the maximum depth, the savings in memory and the guarantee of finding the shortest path usually make it worth it. ↩︎

  3. A heuristic might be something like this. First, suppose we’ve built a table which maps every corner configuration (ignoring edges) to the number of moves needed to solve it. This problem can be brute-forced, as there are “only” $8!\cdot 3^7=88,179,340$ corner configurations. Suppose we are doing IDDFS to solve a whole Rubik’s Cube, and the algorithm is currently at a depth limit of 10. During our DFS (with a limited depth), we arrive at a position at depth 7, and want to decide if we shall continue with it. We can consult our corner configuration table: If we would require more than 3 moves to solve just the corners, then there’s no hope in continuing, since we’ll exceed our depth limit of 10. So we drop the line of search on this configuration entirely by returning from the depth-7 recursive call empty-handed. ↩︎

  4. The centers are typically seen as immobile, and hence aren’t numbered. ↩︎

  5. Shamir, isn’t that name familiar? Yes, he’s the ‘S’ from “RSA”, the encryption algorithm for which he and colleagues ‘R’ Rivest and ‘A’ Adleman won a Turing award. ↩︎

  6. Formally, any subset of five elements of $C$ generates the Rubik’s Cube group. ↩︎

  7. Rubik’s Cube enthusiasts have names for these concepts. If we measure the length of a move sequence by the number of quarter turns, we say we are measuring in the quarter-turn metric or QTM. If instead we are measuring the length of a move sequence by the number of face turns of any degree, we say we are measuring in the half-turn metric or HTM↩︎

  8. Speedsolvers like to write words in last-to-first order, so they can read off the moves as they’re applied. ↩︎

  9. A note on the phrase “print it”. We use the term “print it” to signify that the permutation has been constructed and it may be consumed. We might not literally print it, and instead emit it for use. What this means precisely depends on the programming language you’re using. In our final algorithm, we’ll actually need to explicitly construct generators, so keep that in mind. ↩︎

  10. Again, as in the other footnote, we can see “walking” or “printing” or … as again a manifestation of a process of generating something one-by-one. ↩︎

  11. Construct generators?! This is the third footnote dedicated to walking/printing/generating, because it’s important and sometimes difficult. Making a generator may be utterly trivial in your language (Scheme with call/cc or Python with yield), cumbersome (Common Lisp with cl-cont), or downright annoying. One trick we used when implementing the algorithm in Common Lisp is to keep track of where we are in the permutation trie by a permutation itself. We can always go to the next one if we can find the current one. ↩︎