<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on stylewarning's screed</title><link>http://www.stylewarning.com/posts/</link><description>Recent content in Posts on stylewarning's screed</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Thu, 02 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="http://www.stylewarning.com/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>Idiomatic Lisp and the nbody benchmark</title><link>http://www.stylewarning.com/posts/nbody/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>http://www.stylewarning.com/posts/nbody/</guid><description>&lt;p>When talking to Lisp programmers, you often hear something like, &amp;ldquo;adapt Lisp to your problem, not your problem to Lisp.&amp;rdquo; The basic idea is this: if Lisp doesn&amp;rsquo;t let you easily write a solution to your problem because it lacks some fundamental constructs that make expressing solutions easy, then add them to Lisp first, then write your solution.&lt;/p>
&lt;p>That sounds all good and well in the abstract, and maybe we could even come up with some toy examples—say, defining HTTP request routing logic in a nice DSL. But where&amp;rsquo;s a real example of this that&amp;rsquo;s not artificial or overengineered?&lt;/p>
&lt;p>Recently, on Twitter, I butted into the middle of an exchange between &lt;a href="https://x.com/Ngnghm">@Ngnghm&lt;/a> (a famous Lisp programmer) and &lt;a href="https://x.com/korulang">@korulang&lt;/a> (an account dedicated to a new language called Koru) about Lisp. I&amp;rsquo;m oversimplifying, but it went something like this:&lt;/p>
&lt;ul>
&lt;li>Lisp is slow.&lt;/li>
&lt;li>No it&amp;rsquo;s not!&lt;/li>
&lt;li>Yes it is!&lt;/li>
&lt;li>No it&amp;rsquo;s not!&lt;/li>
&lt;li>Then prove it!&lt;/li>
&lt;/ul>
&lt;p>Now, there&amp;rsquo;s plenty of evidence online that Common Lisp has reasonably good compilers that produce reasonably good machine code, and so the question became more nuanced: Can Lisp be realistically competitive with C without ending up being a mess of unidiomatic code?&lt;/p>
&lt;p>Our interlocutor @korulang proposed a benchmark, the &amp;ldquo;nbody&amp;rdquo; benchmark from the &lt;a href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/nbody.html#nbody">Computer Language Benchmarks Game&lt;/a>. This was of particular interest to them, because they used it as an object of study for their Koru language. To quote &lt;a href="https://www.korulang.org/blog/idiomatic-kernels-match-specialized-c">their blog post&lt;/a>:&lt;/p>
&lt;blockquote>
&lt;p>We wanted Koru kernels to land in the same ballpark as idiomatic C, Rust, and Zig.&lt;/p>
&lt;p>The result was stronger than that.&lt;/p>
&lt;p>Our fused n-body kernel, written in straightforward Koru kernel style, came in faster than the plain reference implementations. Every implementation here is “naive” — the obvious, idiomatic version a competent programmer would write in each language. No tricks, no hand-tuning, no -ffast-math: [&amp;hellip;]&lt;/p>
&lt;/blockquote>
&lt;p>and they proceeded to show Koru being 14% faster than C and 106% faster than Lisp.&lt;/p>
&lt;p>Now, putting aside that some of the code and blog post were written with LLMs, there are many questions that are left unanswered here, since computer architecture and operating system matter a lot (where did these benchmarks run?). Moreover, the author buries the lede a little bit and proceeds to show how we might write &amp;ldquo;unidiomatic&amp;rdquo; C to match the performance of Koru.&lt;/p>
&lt;p>I&amp;rsquo;m not concerned about nitpicking their approach or rigorously evaluating their claims, but I would like to dwell on this common refrain: &amp;ldquo;idiomatic&amp;rdquo;. What is that supposed to mean?&lt;/p>
&lt;p>&amp;ldquo;Idiomatic code&amp;rdquo; in the context of programming means something like &amp;ldquo;representative of a fluent computer programmer&amp;rdquo; and &amp;ldquo;aligned with the peculiar characteristics of the language&amp;rdquo;. In some sense, idiomatic code in a particular language shouldn&amp;rsquo;t stand out amongst other code in that language, and idiomatic code should, in some sense, portray the identity of the language itself.&lt;/p>
&lt;p>Idiomatic C is the C that uses terse names, simple loops, and unsafe arithmetic.&lt;/p>
&lt;p>Idiomatic Haskell is the Haskell that uses short functions, higher-order abstractions, immutable data structures, and safe constructs.&lt;/p>
&lt;p>What about idiomatic Lisp? Well, here&amp;rsquo;s the rub. A fluent programmer at Lisp doesn&amp;rsquo;t reach for one paradigmatic toolbox; they weave in and out of imperative, functional, object-oriented, etc. styles without much of a second thought. There&amp;rsquo;s a sort of &amp;ldquo;meta&amp;rdquo; characteristic to Lisp programming: you&amp;rsquo;re programming the language almost as much as you&amp;rsquo;re programming the program.&lt;/p>
&lt;p>Yes, Lisp has loops, but &amp;ldquo;loopy code&amp;rdquo; isn&amp;rsquo;t intrinsically &amp;ldquo;Lispy code&amp;rdquo;. Yes, Lisp has objects, but &amp;ldquo;OOPy code&amp;rdquo; isn&amp;rsquo;t intrinsically &amp;ldquo;Lispy code&amp;rdquo;. In my opinion, what makes code &amp;ldquo;Lispy&amp;rdquo; is whether or not the programmer used Lisp&amp;rsquo;s metaprogramming and/or built-in multi-paradigm facilities to a reasonable degree to make the solution to their problem efficient and easy to understand in some global sense. For some problems, that may be &amp;ldquo;loopy&amp;rdquo; or &amp;ldquo;OOPy&amp;rdquo; or something else. It&amp;rsquo;s finding a Pareto-efficient syntactic and semantic combination offered by the language, or perhaps one of the programmer&amp;rsquo;s own creation.&lt;/p>
&lt;p>So we get back to the @korulang benchmark challenge. Looking at &lt;a href="https://github.com/korulang/koru/tree/main/tests/regression/900_EXAMPLES_SHOWCASE/910_LANGUAGE_SHOOTOUT/nbody">their repository&lt;/a>:&lt;/p>
&lt;ul>
&lt;li>&lt;code>nbody.c&lt;/code> looks like idiomatic C;&lt;/li>
&lt;li>&lt;code>nbody.hs&lt;/code> looks like wildly unidiomatic Haskell, but the problem is, the idiomatic version would probably be slower;&lt;/li>
&lt;li>&lt;code>nbody.lisp&lt;/code> looks reasonable, though it could easily be improved, but loopy; and&lt;/li>
&lt;li>The Koru solution &lt;code>kernel_fused.kz&lt;/code> looks idiomatic, as far as I can tell for not knowing anything about Koru.&lt;/li>
&lt;/ul>
&lt;p>I hesitate to say &lt;code>nbody.lisp&lt;/code> is idiomatic. It&amp;rsquo;s &lt;em>reasonable&lt;/em>, it&amp;rsquo;s &lt;em>straightforward&lt;/em> to any imperative-minded programmer, but it&amp;rsquo;s not Lispy. That doesn&amp;rsquo;t make it good or bad, but it does lead to the grand question:&lt;/p>
&lt;p>&lt;strong>Can we use Common Lisp to express a solution to the nbody benchmark in a way that reads more naturally than a direct-from-C port?&lt;/strong>&lt;/p>
&lt;p>I would say that, at face value, Koru&amp;rsquo;s solution is along the lines of what is more natural relative to the problem itself. Here are the essential bits.&lt;/p>
&lt;pre tabindex="0">&lt;code>~std.kernel:shape(Body) {
x: f64, y: f64, z: f64,
vx: f64, vy: f64, vz: f64,
mass: f64,
}
~std.kernel:init(Body) {
{ x: 0, y: 0, z: 0, vx: 0, vy: 0, vz: 0, mass: SOLAR_MASS },
{ x: 4.84143144246472090e+00, y: -1.16032004402742839e+00, z: -1.03622044471123109e-01, vx: 1.66007664274403694e-03 * DAYS_PER_YEAR, vy: 7.69901118419740425e-03 * DAYS_PER_YEAR, vz: -6.90460016972063023e-05 * DAYS_PER_YEAR, mass: 9.54791938424326609e-04 * SOLAR_MASS },
{ x: 8.34336671824457987e+00, y: 4.12479856412430479e+00, z: -4.03523417114321381e-01, vx: -2.76742510726862411e-03 * DAYS_PER_YEAR, vy: 4.99852801234917238e-03 * DAYS_PER_YEAR, vz: 2.30417297573763929e-05 * DAYS_PER_YEAR, mass: 2.85885980666130812e-04 * SOLAR_MASS },
{ x: 1.28943695621391310e+01, y: -1.51111514016986312e+01, z: -2.23307578892655734e-01, vx: 2.96460137564761618e-03 * DAYS_PER_YEAR, vy: 2.37847173959480950e-03 * DAYS_PER_YEAR, vz: -2.96589568540237556e-05 * DAYS_PER_YEAR, mass: 4.36624404335156298e-05 * SOLAR_MASS },
{ x: 1.53796971148509165e+01, y: -2.59193146099879641e+01, z: 1.79258772950371181e-01, vx: 2.68067772490389322e-03 * DAYS_PER_YEAR, vy: 1.62824170038242295e-03 * DAYS_PER_YEAR, vz: -9.51592254519715870e-05 * DAYS_PER_YEAR, mass: 5.15138902046611451e-05 * SOLAR_MASS },
}
| kernel k |&amp;gt;
std.kernel:step(0..iterations)
|&amp;gt; std.kernel:pairwise {
const dx = k.x - k.other.x;
const dy = k.y - k.other.y;
const dz = k.z - k.other.z;
const dsq = dx*dx + dy*dy + dz*dz;
const mag = DT / (dsq * @sqrt(dsq));
k.vx -= dx * k.other.mass * mag;
k.vy -= dy * k.other.mass * mag;
k.vz -= dz * k.other.mass * mag;
k.other.vx += dx * k.mass * mag;
k.other.vy += dy * k.mass * mag;
k.other.vz += dz * k.mass * mag;
}
|&amp;gt; std.kernel:self {
k.x += DT * k.vx;
k.y += DT * k.vy;
k.z += DT * k.vz;
}
| computed c |&amp;gt;
capture({ energy: @as(f64, 0) })
| as acc |&amp;gt;
for(0..5)
| each i |&amp;gt;
captured { energy: acc.energy + 0.5*c[i].mass*(c[i].vx*c[i].vx+c[i].vy*c[i].vy+c[i].vz*c[i].vz) }
|&amp;gt; for(i+1..5)
| each j |&amp;gt;
captured { energy: acc.energy - c[i].mass*c[j].mass / @sqrt((c[i].x-c[j].x)*(c[i].x-c[j].x)+(c[i].y-c[j].y)*(c[i].y-c[j].y)+(c[i].z-c[j].z)*(c[i].z-c[j].z)) }
| captured final |&amp;gt;
std.io:print.blk {
{{ final.energy:d:.9 }}
}
&lt;/code>&lt;/pre>&lt;p>Can we achieve something similar in Lisp?&lt;/p>
&lt;p>First, let&amp;rsquo;s make a baseline. I&amp;rsquo;m running Ubuntu Noble with a &amp;ldquo;AMD RYZEN AI MAX+ PRO 395&amp;rdquo; with a clock speed that varies between 0.6–5 GHz. I am also using SBCL 2.6.3 and gcc 13.3. Using &lt;code>nbody.lisp&lt;/code> as a starting point, &lt;a href="https://raw.githubusercontent.com/stylewarning/lisp-random/refs/heads/master/nbody/conventional.lisp">I modified it&lt;/a> for a few easy wins. I&amp;rsquo;ll call this version &lt;code>nbody-lisp-conventional&lt;/code>. A quick benchmark reveals that the loopy Lisp code is only about 20% slower than the C code compiled with &lt;code>gcc -O3 -ffast-math -march=native&lt;/code>.&lt;/p>
&lt;pre tabindex="0">&lt;code>$ ./nbody-lisp-conventional 50000000
-0.169286396
timing: 2000 ms
$ ./nbody-c 50000000
-0.169286396
timing: 1662 ms
&lt;/code>&lt;/pre>&lt;p>As a Lisp programmer, it&amp;rsquo;s not surprising that it&amp;rsquo;s a little slower. The number of person-years that have gone into C compilers to optimize idiomatic C code makes the development effort behind SBCL, the most popular open-source Lisp compiler, look like a rounding error.&lt;/p>
&lt;p>Now that we have a baseline, our goal is to come up with a nicer Lisp program that also improves the timing.&lt;/p>
&lt;p>Our approach will be simple. We will create a &lt;code>library.lisp&lt;/code> that contains new language constructs of a similar ilk to Koru, and we will use them to implement the nbody benchmark in &lt;code>impl.lisp&lt;/code>. Some rules:&lt;/p>
&lt;ul>
&lt;li>No compile-time precomputation or caching. I can&amp;rsquo;t just compute the answer at compile time, or cache a sub-computation that makes the full one trivial.&lt;/li>
&lt;li>No fundamental algorithm changes. I can&amp;rsquo;t use a different integrator, for example.&lt;/li>
&lt;li>Using assembly is allowed, but it must only make use of the facilities offered by the Lisp compiler (i.e., no external tools), and the implementation of nbody itself must be understandable without knowing assembly. In other words, it should be sufficiently hidden, and in principle easily substitutable with portable code.&lt;/li>
&lt;li>Library code must be in principle useful for other similar tasks. It should not be hyper-specialized to this specific problem instance, but instead be useful for this general class of problems.&lt;/li>
&lt;/ul>
&lt;p>The third rule is more rigorous than it looks. It means we can&amp;rsquo;t just have a &lt;code>solve-nbody&lt;/code> problem which dispatches to assembly.&lt;/p>
&lt;p>To accomplish the above, we define a kernel DSL. The DSL allows us to express how elements of a composite transform, maintaining just enough invariants to allow them to be handled efficiently. These kernels are then compiled into efficient code, more efficient than ordinary loopy Lisp allows for.&lt;/p>
&lt;p>Our attention will be focused on a proof-of-concept library of functionality for writing particle simulators. The operators we define are:&lt;/p>
&lt;ul>
&lt;li>&lt;code>define-kernel-shape&lt;/code>: Define the data to be transformed by each kernel. This would be the data to characterize the static and dynamic properties of a particle in motion, as well as the number of particles under consideration.&lt;/li>
&lt;li>&lt;code>define-kernel-step&lt;/code>: Define a kernel as a sequence of existing ones.&lt;/li>
&lt;li>&lt;code>define-self-kernel&lt;/code>: Define a read-write kernel that operates on each element independently, without access to other elements (i.e., a &lt;em>map&lt;/em> operation).&lt;/li>
&lt;li>&lt;code>define-pairwise-kernel&lt;/code>: Define a read-write kernel that operates on all pairs of elements, reduced by symmetry (i.e., &lt;code>(i,j)&lt;/code> and &lt;code>(j,i)&lt;/code> are considered only once).&lt;/li>
&lt;li>&lt;code>define-reduction-kernel&lt;/code>: Define a read-only kernel that does reduction of a sequence into a single value (i.e., a &lt;em>reduce&lt;/em> operation).&lt;/li>
&lt;/ul>
&lt;p>This collection of five operators forms a miniature, re-usable language. These broadly recapitulate those of Koru, and allow us to write something that looks like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-lisp" data-lang="lisp">(defconstant +solar-mass+ (&lt;span style="color:#a6e22e">*&lt;/span> &lt;span style="color:#ae81ff">4d0&lt;/span> pi pi))
(defconstant +days-per-year+ &lt;span style="color:#ae81ff">365.24d0&lt;/span>)
(defconstant +dt+ &lt;span style="color:#ae81ff">0.01d0&lt;/span>)
(define-kernel-shape body &lt;span style="color:#ae81ff">5&lt;/span>
x y z vx vy vz mass)
(defparameter *system*
(make-body-system
(&lt;span style="color:#a6e22e">list&lt;/span> &lt;span style="color:#e6db74">:x&lt;/span> &lt;span style="color:#ae81ff">0d0&lt;/span> &lt;span style="color:#e6db74">:y&lt;/span> &lt;span style="color:#ae81ff">0d0&lt;/span> &lt;span style="color:#e6db74">:z&lt;/span> &lt;span style="color:#ae81ff">0d0&lt;/span>
&lt;span style="color:#e6db74">:vx&lt;/span> &lt;span style="color:#ae81ff">0d0&lt;/span> &lt;span style="color:#e6db74">:vy&lt;/span> &lt;span style="color:#ae81ff">0d0&lt;/span> &lt;span style="color:#e6db74">:vz&lt;/span> &lt;span style="color:#ae81ff">0d0&lt;/span>
&lt;span style="color:#e6db74">:mass&lt;/span> +solar-mass+)
&lt;span style="color:#f92672">...&lt;/span>))
(define-pairwise-kernel advance-forces (s body dt)
(&lt;span style="color:#66d9ef">let*&lt;/span> ((dx (&lt;span style="color:#a6e22e">-&lt;/span> i.x j.x))
(dy (&lt;span style="color:#a6e22e">-&lt;/span> i.y j.y))
(dz (&lt;span style="color:#a6e22e">-&lt;/span> i.z j.z))
(dsq (&lt;span style="color:#a6e22e">+&lt;/span> (&lt;span style="color:#a6e22e">+&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> dx dx) (&lt;span style="color:#a6e22e">*&lt;/span> dy dy)) (&lt;span style="color:#a6e22e">*&lt;/span> dz dz)))
(mag (&lt;span style="color:#a6e22e">/&lt;/span> dt (&lt;span style="color:#a6e22e">*&lt;/span> dsq (&lt;span style="color:#a6e22e">sqrt&lt;/span> dsq)))))
(&lt;span style="color:#66d9ef">let&lt;/span> ((dm-j (&lt;span style="color:#a6e22e">*&lt;/span> mag j.mass))
(dm-i (&lt;span style="color:#a6e22e">*&lt;/span> mag i.mass)))
(decf i.vx (&lt;span style="color:#a6e22e">*&lt;/span> dx dm-j))
(decf i.vy (&lt;span style="color:#a6e22e">*&lt;/span> dy dm-j))
(decf i.vz (&lt;span style="color:#a6e22e">*&lt;/span> dz dm-j))
(incf j.vx (&lt;span style="color:#a6e22e">*&lt;/span> dx dm-i))
(incf j.vy (&lt;span style="color:#a6e22e">*&lt;/span> dy dm-i))
(incf j.vz (&lt;span style="color:#a6e22e">*&lt;/span> dz dm-i)))))
(define-self-kernel advance-positions (s body dt)
(incf self.x (&lt;span style="color:#a6e22e">*&lt;/span> dt self.vx))
(incf self.y (&lt;span style="color:#a6e22e">*&lt;/span> dt self.vy))
(incf self.z (&lt;span style="color:#a6e22e">*&lt;/span> dt self.vz)))
(define-reduction-kernel (energy e &lt;span style="color:#ae81ff">0d0&lt;/span>) (s body)
(&lt;span style="color:#e6db74">:self&lt;/span>
(&lt;span style="color:#a6e22e">+&lt;/span> e (&lt;span style="color:#a6e22e">*&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> &lt;span style="color:#ae81ff">0.5d0&lt;/span> self.mass)
(&lt;span style="color:#a6e22e">+&lt;/span> (&lt;span style="color:#a6e22e">+&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> self.vx self.vx) (&lt;span style="color:#a6e22e">*&lt;/span> self.vy self.vy))
(&lt;span style="color:#a6e22e">*&lt;/span> self.vz self.vz)))))
(&lt;span style="color:#e6db74">:pair&lt;/span>
(&lt;span style="color:#66d9ef">let*&lt;/span> ((dx (&lt;span style="color:#a6e22e">-&lt;/span> i.x j.x))
(dy (&lt;span style="color:#a6e22e">-&lt;/span> i.y j.y))
(dz (&lt;span style="color:#a6e22e">-&lt;/span> i.z j.z)))
(&lt;span style="color:#a6e22e">-&lt;/span> e (&lt;span style="color:#a6e22e">/&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> i.mass j.mass)
(&lt;span style="color:#a6e22e">sqrt&lt;/span> (&lt;span style="color:#a6e22e">+&lt;/span> (&lt;span style="color:#a6e22e">+&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> dx dx) (&lt;span style="color:#a6e22e">*&lt;/span> dy dy))
(&lt;span style="color:#a6e22e">*&lt;/span> dz dz))))))))
(define-kernel-step run-simulation (system body n &lt;span style="color:#e6db74">:params&lt;/span> ((dt &lt;span style="color:#66d9ef">double-float&lt;/span>)))
(advance-forces dt)
(advance-positions dt))
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Well, in fact, this isn&amp;rsquo;t an ideal approximation, it&amp;rsquo;s almost exactly &lt;a href="https://raw.githubusercontent.com/stylewarning/lisp-random/refs/heads/master/nbody/impl.lisp">how it turned out&lt;/a>. Given this is a proof of concept, we sometimes have to write some Lisp things a little funny. For example, you&amp;rsquo;ll notice we write:&lt;/p>
&lt;pre tabindex="0">&lt;code>(+ (+ (* dx dx) (* dy dy)) (* dz dz))
&lt;/code>&lt;/pre>&lt;p>instead of the far more readable&lt;/p>
&lt;pre tabindex="0">&lt;code>(+ (* dx dx) (* dy dy) (* dz dz))
&lt;/code>&lt;/pre>&lt;p>Both are completely valid and both can be used. So why the former? It is a result of a limitation of a little feature I built in: auto-vectorization. The vectorizer walks the mathematical expressions and replaces them with fast SIMD variants instead. Here&amp;rsquo;s a little fragment showing this rewrite rule:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-lisp" data-lang="lisp">&lt;span style="color:#f92672">...&lt;/span>
(case (&lt;span style="color:#a6e22e">car&lt;/span> expr)
&lt;span style="color:#75715e">;; (+ a (* b c)) -&amp;gt; fmadd(a,b,c)&lt;/span>
((&lt;span style="color:#a6e22e">+&lt;/span>)
(&lt;span style="color:#66d9ef">let&lt;/span> ((args (&lt;span style="color:#a6e22e">cdr&lt;/span> expr)))
(cond
((and (&lt;span style="color:#a6e22e">=&lt;/span> (&lt;span style="color:#a6e22e">length&lt;/span> args) &lt;span style="color:#ae81ff">2&lt;/span>) (mul-p (&lt;span style="color:#a6e22e">second&lt;/span> args)))
&lt;span style="color:#f92672">`&lt;/span>(%%fmadd-pd &lt;span style="color:#f92672">,&lt;/span>(xf (&lt;span style="color:#a6e22e">first&lt;/span> args))
&lt;span style="color:#f92672">,&lt;/span>(xf (&lt;span style="color:#a6e22e">second&lt;/span> (&lt;span style="color:#a6e22e">second&lt;/span> args)))
&lt;span style="color:#f92672">,&lt;/span>(xf (&lt;span style="color:#a6e22e">third&lt;/span> (&lt;span style="color:#a6e22e">second&lt;/span> args)))))
&lt;span style="color:#f92672">...&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>The implementation of these kernel macros in &lt;a href="https://raw.githubusercontent.com/stylewarning/lisp-random/refs/heads/master/nbody/library.lisp">&lt;code>library.lisp&lt;/code>&lt;/a> weighs in at just under 700 lines, and includes optional x64 SIMD auto-vectorization.&lt;/p>
&lt;p>Well, for the nail biting moment, how does it compare? I made a &lt;a href="https://raw.githubusercontent.com/stylewarning/lisp-random/refs/heads/master/nbody/Makefile">Makefile&lt;/a> that compares the idiomatic C against the loopy Lisp against our kernel DSL Lisp. It does a median-of-3. Running this on my computer gives:&lt;/p>
&lt;pre tabindex="0">&lt;code>$ make bench
=== C (gcc -O3 -ffast-math) ===
-0.169286396
runs: 1657 1664 1653 ms
median: 1657 ms
=== Lisp (SBCL, conventional loops) ===
-0.169286396
runs: 1991 2009 2005 ms
median: 2005 ms
=== Lisp (SBCL, kernel syntax) ===
-0.169286396
runs: 1651 1651 1652 ms
median: 1651 ms
&lt;/code>&lt;/pre>&lt;p>So, in fact, we have matched the performance of C almost exactly. Furthermore, the generated code is still not as lean as it could be. Not to put too fine a point on it, but, &lt;strong>&amp;lt;100 lines of Lisp&lt;/strong>, supported by&lt;/p>
&lt;ul>
&lt;li>700 lines of library code and about 4 hours of my time; and&lt;/li>
&lt;li>500k lines of its host compiler &lt;code>sbcl&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>has performance parity and greater readability/reusability than &amp;lt;100 lines of C&lt;/strong>, supported by&lt;/p>
&lt;ul>
&lt;li>~5,000k lines of just the C part of its host compiler &lt;code>gcc&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>None of this is to make an argument that Lisp is &amp;ldquo;better&amp;rdquo;, or that there isn&amp;rsquo;t merit to &lt;em>avoiding&lt;/em> custom DSLs in certain circumstances, or that the world doesn&amp;rsquo;t have room for more custom home-grown compilers and parsers, but I think this is the clearest possible, quasi-realistic demonstration that idiomatic Lisp can be as fast as idiomatic C without tremendous work, whilst netting additional benefits unique to Lisp.&lt;/p>
&lt;p>All code is available &lt;a href="https://github.com/stylewarning/lisp-random/tree/master/nbody">here&lt;/a>.&lt;/p></description></item><item><title>Beating Bellard's formula</title><link>http://www.stylewarning.com/posts/beating-bellard/</link><pubDate>Wed, 04 Mar 2026 00:00:00 +0000</pubDate><guid>http://www.stylewarning.com/posts/beating-bellard/</guid><description>&lt;p>&lt;em>By Robert Smith&lt;/em>&lt;/p>
&lt;p>Fabrice Bellard came up with a computationally efficient formula for
calculating the &lt;em>n&lt;/em>th hexadecimal digit of $\pi$ without calculating any of
the previous &lt;em>n&lt;/em>−1. It&amp;rsquo;s called
&lt;a href="https://en.wikipedia.org/wiki/Bellard%27s_formula">Bellard&amp;rsquo;s formula&lt;/a>.
It wasn&amp;rsquo;t the first of its kind, but in terms of computational efficiency,
it was a substantial improvement over the original, elegant
&lt;a href="https://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula">Bailey-Borwein-Plouffe formula&lt;/a>.
Due to the trio&amp;rsquo;s discovery, these formulas are often called &lt;em>BBP-type formulas&lt;/em>.&lt;/p>
&lt;p>Over the years, numerous BBP-type formulas have been discovered. In fact,
&lt;a href="https://www.davidhbailey.com/dhbpapers/bbp-formulas.pdf">Bailey&lt;/a> gives us
a recipe to search for them using integer-relation algorithms. In
simple terms, we can just guess formulas, and run a computation to see if
it likely equals $\pi$ with high confidence. If we do find one, then we
can use it as a conjecture to prove formally.&lt;/p>
&lt;p>Like Bellard and many others, I ran a variant of Bailey&amp;rsquo;s recipe, effectively
doing a brute-force search, highly optimized and in parallel. The search yielded
another formula that is computationally more efficient than Bellard&amp;rsquo;s formula. The
identity is as follows:&lt;/p>
&lt;p>$$
\pi = \sum_{k=0}^{\infty} \frac{1}{4096^k} \left( \frac{1}{6k+1}
- \frac{2^{-5}}{6k+3}
+ \frac{2^{-8}}{6k+5}
+ \frac{2}{8k+1}
- \frac{2^{-5}}{8k+5}
+ \frac{2^{-1}}{12k+3}
- \frac{2^{-4}}{12k+7}
- \frac{2^{-8}}{12k+11} \right).
$$&lt;/p>
&lt;p>It converges at a rate of 12 bits per term. We will prove convergence, and then
prove the identity itself (with a little computer assistance). As it turns out,
an equivalent form of this formula was already discovered, which we will discuss
as well. Finally, we&amp;rsquo;ll show a very simple implementation in Common Lisp.&lt;/p>
&lt;h2 id="proof-of-convergence">Proof of convergence&lt;/h2>
&lt;p>Write the series as $S := \sum_{k=0}^{\infty} 4096^{-k}R(k)$. Since
$R(k)\in O(1/k)$, convergence is dominated by the geometric term $4096^{-k}$:&lt;/p>
&lt;p>$$
\lim_{k \to \infty} \left\vert \frac{R(k+1)}{4096^{k+1}} \middle/ \frac{R(k)}{4096^{k}} \right\vert = \frac{1}{4096}.
$$&lt;/p>
&lt;p>By the ratio test, the series converges absolutely. Since $4096 = 2^{12}$,
each additional term contributes exactly 12 bits of precision.&lt;/p>
&lt;p>Bellard&amp;rsquo;s formula converges at 10 bits per term and requires the evaluation
of 7 fractions. The above converges at 12 bits per term, and requires the
evaluation of 8 fractions. So while we require 20% fewer terms, each term
requires about 14% more arithmetic. So, net-net, this formula is approximately
5–6% more efficient.&lt;/p>
&lt;h2 id="proof-of-identity-via-a-definite-integral">Proof of identity via a definite integral&lt;/h2>
&lt;p>Consider $1/(nk+j) = \int_{0}^{1} x^{nk+j-1} dx$. For positive integers $n$ and $b$, we get&lt;/p>
&lt;p>$$
\sum_{k=0}^{\infty} \frac{1}{b^k}\cdot\frac{1}{nk+j} = \sum_{k=0}^{\infty} \int_{0}^{1} \left(\frac{x^n}{b}\right)^k x^{j-1} dx.
$$&lt;/p>
&lt;p>We can swap the sum and integral via the Lebesgue dominated convergence theorem, since the
power series $\sum (x^n/b)^k$ converges uniformly for $x \in [0, 1]$ and $b &amp;gt; 1$. Using this
and summing the geometric series gives:&lt;/p>
&lt;p>$$
\int_{0}^{1} x^{j-1} \sum_{k=0}^{\infty} \left(\frac{x^n}{b}\right)^k dx = \int_{0}^{1} \frac{x^{j-1}}{1 - x^n/b} dx.
$$&lt;/p>
&lt;p>We now apply this to $S$ termwise with $b=4096=2^{12}$:&lt;/p>
&lt;p>$$
S = \int_0^1 \left( \frac{x^{0}}{1 - \frac{x^6}{2^{12}}}
- 2^{-5} \frac{x^{2}}{1 - \frac{x^6}{2^{12}}}
+ 2^{-8} \frac{x^{4}}{1 - \frac{x^6}{2^{12}}}
+ 2 \frac{x^{0}}{1 - \frac{x^8}{2^{12}}}
- 2^{-5} \frac{x^{4}}{1 - \frac{x^8}{2^{12}}}
+ 2^{-1} \frac{x^{2}}{1 - \frac{x^{12}}{2^{12}}}
- 2^{-4} \frac{x^{6}}{1 - \frac{x^{12}}{2^{12}}}
- 2^{-8} \frac{x^{10}}{1 - \frac{x^{12}}{2^{12}}}
\right) dx.
$$&lt;/p>
&lt;p>At this point, you could try to algebra your way through, expanding, using the
substitution $x=2u$, etc. ultimately yielding a nice denominator
$(u^2\pm 2u+2)(u^6-64)(u^{12}-1)$. Maybe compute some residues. Or, just CAS your
way through.&lt;/p>
&lt;pre tabindex="0">&lt;code>% fricas
FriCAS Computer Algebra System
Version: FriCAS 2025.12.23git built with sbcl 2.5.2.1852-1f3beec71
Timestamp: Wed Mar 4 12:41:38 EST 2026
-----------------------------------------------------------------------------
Issue )copyright to view copyright notices.
Issue )summary for a summary of useful system commands.
Issue )quit to leave FriCAS and return to shell.
-----------------------------------------------------------------------------
(1) -&amp;gt; f := (1/(1 - x^6/4096))
- (1/32)*x^2/(1 - x^6/4096)
+ (1/256)*x^4/(1 - x^6/4096)
+ 2*1/(1 - x^8/4096)
- (1/32)*x^4/(1 - x^8/4096)
+ (1/2)*x^2/(1 - x^12/4096)
- (1/16)*x^6/(1 - x^12/4096)
- (1/256)*x^10/(1 - x^12/4096);
Type: Fraction(Polynomial(Fraction(Integer)))
(2) -&amp;gt; normalize(integrate(f, x = 0..1))
3 1 11 19 1
(2) 2 atan(-) - 2 atan(-) + 2 atan(--) + 2 atan(--) + 2 atan(-)
2 2 24 48 4
Type: Expression(Fraction(Integer))
&lt;/code>&lt;/pre>&lt;p>So now we just need to show the arctans all collapse to $\pi$. Recall the identity&lt;/p>
&lt;p>$$
\tan^{-1} a \pm \tan^{-1} b = \tan^{-1}\left(\frac{a\pm b}{1\mp ab}\right).
$$&lt;/p>
&lt;p>The sum of the first four terms can be calculated easily in Common Lisp:&lt;/p>
&lt;pre tabindex="0">&lt;code>% sbcl --no-inform
* (defun combine (a b) (/ (+ a b) (- 1 (* a b))))
COMBINE
* (reduce #'combine '(3/2 -1/2 11/24 19/48))
4
&lt;/code>&lt;/pre>&lt;p>So we have $2\big(\tan^{-1}4 + \tan^{-1}(1/4)\big)$, and with our final elementary trig
identity $\tan^{-1} (a/b) = \pi/2 - \tan^{-1} (b/a)$, we find $S = \pi$.&lt;/p>
&lt;h2 id="a-new-discovery">A new discovery?&lt;/h2>
&lt;p>Of course, I was excited to find this formula, but after some internet spelunking, it turns out
it had already been discovered by &lt;a href="https://web.archive.org/web/20181225155904if_/http://gery.huvent.pagesperso-orange.fr:80/index_explorer_net.htm">Géry Huvent&lt;/a>
and &lt;a href="http://www.pi314.net/eng/hypergse6.php">Boris Gourévitch&lt;/a>, perhaps independently. Gourévitch
doesn&amp;rsquo;t credit Huvent as he does with other formulas, but he does say
&amp;ldquo;[&amp;hellip;] furthermore, we can obtain BBP formula [&amp;hellip;] by using what Gery Huvent calls
the denomination tables [&amp;hellip;].&amp;rdquo; Daisuke Takahashi cites Huvent&amp;rsquo;s website in
&lt;a href="https://tsukuba.repo.nii.ac.jp/record/2001720/files/RJ_51-1-177.pdf">this 2019 paper&lt;/a> published in
&lt;em>The Ramanujan Journal&lt;/em>. In all cases, they write the formula in the following way:&lt;/p>
&lt;p>$$
\frac{1}{128} \sum _{k=0}^{\infty} \frac{1}{2^{12k}}\left(
\frac{768}{24 k+3}+\frac{512}{24k+4}+\frac{128}{24 k+6}-\frac{16}{24 k+12}-\frac{16}{24 k+14}-\frac{12}{24
k+15}+\frac{2}{24 k+20}-\frac{1}{24 k+22}\right),
$$&lt;/p>
&lt;p>which is structurally equivalent to $S$.&lt;/p>
&lt;p>Despite having been known already, this formula doesn&amp;rsquo;t appear to be &lt;em>well&lt;/em> known. As such, I
hope this blog post brings more attention to it.&lt;/p>
&lt;h2 id="simple-implementation">Simple implementation&lt;/h2>
&lt;p>Here is a simple implementation of digit extraction using BBP-type formulas in
Common Lisp:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-lisp" data-lang="lisp">(defun %pow2-mod (exponent modulus)
(cond
((&lt;span style="color:#a6e22e">=&lt;/span> modulus &lt;span style="color:#ae81ff">1&lt;/span>) &lt;span style="color:#ae81ff">0&lt;/span>)
((&lt;span style="color:#a6e22e">zerop&lt;/span> exponent) &lt;span style="color:#ae81ff">1&lt;/span>)
(&lt;span style="color:#66d9ef">t&lt;/span>
(&lt;span style="color:#66d9ef">let&lt;/span> ((result &lt;span style="color:#ae81ff">1&lt;/span>)
(base (&lt;span style="color:#a6e22e">mod&lt;/span> &lt;span style="color:#ae81ff">2&lt;/span> modulus))
(e exponent))
(loop &lt;span style="color:#e6db74">:while&lt;/span> (&lt;span style="color:#a6e22e">plusp&lt;/span> e) &lt;span style="color:#e6db74">:do&lt;/span>
(when (&lt;span style="color:#a6e22e">oddp&lt;/span> e)
(setf result (&lt;span style="color:#a6e22e">mod&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> result base) modulus)))
(setf base (&lt;span style="color:#a6e22e">mod&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> base base) modulus)
e (&lt;span style="color:#a6e22e">ash&lt;/span> e &lt;span style="color:#ae81ff">-1&lt;/span>)))
result))))
(defun %scaled-frac-of-power-two (exponent denom)
(cond
((&lt;span style="color:#a6e22e">&amp;gt;=&lt;/span> exponent &lt;span style="color:#ae81ff">0&lt;/span>)
(&lt;span style="color:#66d9ef">let&lt;/span> ((residue (%pow2-mod exponent denom)))
(&lt;span style="color:#a6e22e">floor&lt;/span> (&lt;span style="color:#a6e22e">ash&lt;/span> residue *precision-bits*) denom)))
(&lt;span style="color:#66d9ef">t&lt;/span>
(&lt;span style="color:#66d9ef">let&lt;/span> ((effective-bits (&lt;span style="color:#a6e22e">+&lt;/span> *precision-bits* exponent)))
(&lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#a6e22e">minusp&lt;/span> effective-bits)
&lt;span style="color:#ae81ff">0&lt;/span>
(&lt;span style="color:#a6e22e">floor&lt;/span> (&lt;span style="color:#a6e22e">ash&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span> effective-bits) denom))))))
(defun %series-scaled-frac (bit-index bbp-series k-step global-shift alternating-p)
&lt;span style="color:#75715e">;; A series is a list of series terms. A series term is a quadruple&lt;/span>
&lt;span style="color:#75715e">;; (SIGN SHIFT DENOM-MULTIPLIER DENOM-OFFSET) representing the summand&lt;/span>
&lt;span style="color:#75715e">;; SIGN * 2^SHIFT / (DENOM_MULTIPLIER * k + DENOM_OFFSET).&lt;/span>
(&lt;span style="color:#66d9ef">let*&lt;/span> ((modulus (&lt;span style="color:#a6e22e">ash&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span> *precision-bits*))
(max-shift (loop &lt;span style="color:#e6db74">:for&lt;/span> term &lt;span style="color:#e6db74">:in&lt;/span> bbp-series &lt;span style="color:#e6db74">:maximize&lt;/span> (&lt;span style="color:#a6e22e">second&lt;/span> term)))
(k-max (&lt;span style="color:#a6e22e">max&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> (&lt;span style="color:#a6e22e">ceiling&lt;/span> (&lt;span style="color:#a6e22e">+&lt;/span> bit-index &lt;span style="color:#75715e">; conservative bound&lt;/span>
global-shift
max-shift
*precision-bits*
*guard-bits*)
k-step))))
(loop &lt;span style="color:#e6db74">:with&lt;/span> acc &lt;span style="color:#e6db74">:=&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;span style="color:#e6db74">:for&lt;/span> k &lt;span style="color:#e6db74">:from&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> &lt;span style="color:#e6db74">:to&lt;/span> k-max &lt;span style="color:#e6db74">:do&lt;/span>
(&lt;span style="color:#66d9ef">let&lt;/span> ((k-sign (&lt;span style="color:#66d9ef">if&lt;/span> (and alternating-p (&lt;span style="color:#a6e22e">oddp&lt;/span> k)) &lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>))
(k-factor (&lt;span style="color:#a6e22e">*&lt;/span> k-step k)))
(dolist (term bbp-series)
(destructuring-bind (term-sign shift den-mul den-add) term
(&lt;span style="color:#66d9ef">let*&lt;/span> ((denom (&lt;span style="color:#a6e22e">+&lt;/span> den-add (&lt;span style="color:#a6e22e">*&lt;/span> den-mul k)))
(exponent (&lt;span style="color:#a6e22e">+&lt;/span> bit-index global-shift shift (&lt;span style="color:#a6e22e">-&lt;/span> k-factor)))
(piece (%scaled-frac-of-power-two exponent denom))
(signed (&lt;span style="color:#a6e22e">*&lt;/span> k-sign term-sign)))
(when (&lt;span style="color:#a6e22e">plusp&lt;/span> piece)
(setf acc (&lt;span style="color:#a6e22e">mod&lt;/span> (&lt;span style="color:#a6e22e">+&lt;/span> acc (&lt;span style="color:#a6e22e">*&lt;/span> signed piece)) modulus)))))))
&lt;span style="color:#e6db74">:finally&lt;/span> (return acc))))
(defun %nth-hex-from-series (n terms k-step global-shift alternating-p)
(&lt;span style="color:#66d9ef">let*&lt;/span> ((bit-index (&lt;span style="color:#a6e22e">*&lt;/span> &lt;span style="color:#ae81ff">4&lt;/span> n)))
(&lt;span style="color:#a6e22e">ldb&lt;/span> (&lt;span style="color:#a6e22e">byte&lt;/span> &lt;span style="color:#ae81ff">4&lt;/span> (&lt;span style="color:#a6e22e">-&lt;/span> *precision-bits* &lt;span style="color:#ae81ff">4&lt;/span>))
(%series-scaled-frac bit-index
terms
k-step
global-shift
alternating-p))))
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This implementation uses Lisp&amp;rsquo;s arbitrary precision integer arithmetic.
A &amp;ldquo;real&amp;rdquo; implementation would use more efficient arithmetic, but
this will suffice for some basic testing. Now we can write functions
to use the Bellard formula and the new formula:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-lisp" data-lang="lisp">(defparameter +bellard-terms+
&lt;span style="color:#f92672">&amp;#39;&lt;/span>((&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">5&lt;/span> &lt;span style="color:#ae81ff">4&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>)
(&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> &lt;span style="color:#ae81ff">4&lt;/span> &lt;span style="color:#ae81ff">3&lt;/span>)
(&lt;span style="color:#ae81ff">+1&lt;/span> &lt;span style="color:#ae81ff">8&lt;/span> &lt;span style="color:#ae81ff">10&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>)
(&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">6&lt;/span> &lt;span style="color:#ae81ff">10&lt;/span> &lt;span style="color:#ae81ff">3&lt;/span>)
(&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">2&lt;/span> &lt;span style="color:#ae81ff">10&lt;/span> &lt;span style="color:#ae81ff">5&lt;/span>)
(&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">2&lt;/span> &lt;span style="color:#ae81ff">10&lt;/span> &lt;span style="color:#ae81ff">7&lt;/span>)
(&lt;span style="color:#ae81ff">+1&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> &lt;span style="color:#ae81ff">10&lt;/span> &lt;span style="color:#ae81ff">9&lt;/span>)))
(defun bellard-nth-hex (n)
(%nth-hex-from-series (&lt;span style="color:#a6e22e">*&lt;/span> &lt;span style="color:#ae81ff">4&lt;/span> n) +bellard-terms+ &lt;span style="color:#ae81ff">10&lt;/span> &lt;span style="color:#ae81ff">-6&lt;/span> &lt;span style="color:#66d9ef">t&lt;/span>))
(defparameter +new-terms+
&lt;span style="color:#f92672">&amp;#39;&lt;/span>((&lt;span style="color:#ae81ff">+1&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> &lt;span style="color:#ae81ff">6&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>)
(&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">-5&lt;/span> &lt;span style="color:#ae81ff">6&lt;/span> &lt;span style="color:#ae81ff">3&lt;/span>)
(&lt;span style="color:#ae81ff">+1&lt;/span> &lt;span style="color:#ae81ff">-8&lt;/span> &lt;span style="color:#ae81ff">6&lt;/span> &lt;span style="color:#ae81ff">5&lt;/span>)
(&lt;span style="color:#ae81ff">+1&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span> &lt;span style="color:#ae81ff">8&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>)
(&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">-5&lt;/span> &lt;span style="color:#ae81ff">8&lt;/span> &lt;span style="color:#ae81ff">5&lt;/span>)
(&lt;span style="color:#ae81ff">+1&lt;/span> &lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">12&lt;/span> &lt;span style="color:#ae81ff">3&lt;/span>)
(&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">-4&lt;/span> &lt;span style="color:#ae81ff">12&lt;/span> &lt;span style="color:#ae81ff">7&lt;/span>)
(&lt;span style="color:#ae81ff">-1&lt;/span> &lt;span style="color:#ae81ff">-8&lt;/span> &lt;span style="color:#ae81ff">12&lt;/span> &lt;span style="color:#ae81ff">11&lt;/span>)))
(defun new-nth-hex (n)
(%nth-hex-from-series (&lt;span style="color:#a6e22e">*&lt;/span> &lt;span style="color:#ae81ff">4&lt;/span> n) +new-terms+ &lt;span style="color:#ae81ff">12&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> &lt;span style="color:#66d9ef">nil&lt;/span>))
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Let&amp;rsquo;s make sure they agree for the first 1000 hex digits:&lt;/p>
&lt;pre tabindex="0">&lt;code>CL-USER&amp;gt; (loop :for i :below 1000
:always (= (bellard-nth-hex i) (new-nth-hex i)))
T
&lt;/code>&lt;/pre>&lt;p>And now let&amp;rsquo;s look at timing comparisons. Here&amp;rsquo;s a little driver:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-lisp" data-lang="lisp">(defun compare-timings (n)
(&lt;span style="color:#66d9ef">flet&lt;/span> ((time-it (f n)
(sb-ext:gc &lt;span style="color:#e6db74">:full&lt;/span> &lt;span style="color:#66d9ef">t&lt;/span>)
(&lt;span style="color:#66d9ef">let&lt;/span> ((start (&lt;span style="color:#a6e22e">get-internal-real-time&lt;/span>)))
(&lt;span style="color:#a6e22e">funcall&lt;/span> f n)
(&lt;span style="color:#a6e22e">-&lt;/span> (&lt;span style="color:#a6e22e">get-internal-real-time&lt;/span>) start))))
(loop &lt;span style="color:#e6db74">:repeat&lt;/span> n
&lt;span style="color:#e6db74">:for&lt;/span> index &lt;span style="color:#e6db74">:=&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span> &lt;span style="color:#e6db74">:then&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> &lt;span style="color:#ae81ff">10&lt;/span> index)
&lt;span style="color:#e6db74">:for&lt;/span> bellard &lt;span style="color:#e6db74">:=&lt;/span> (time-it &lt;span style="color:#a6e22e">#&amp;#39;&lt;/span>bellard-nth-hex index)
&lt;span style="color:#e6db74">:for&lt;/span> new &lt;span style="color:#e6db74">:=&lt;/span> (time-it &lt;span style="color:#a6e22e">#&amp;#39;&lt;/span>new-nth-hex index)
&lt;span style="color:#e6db74">:do&lt;/span> (&lt;span style="color:#a6e22e">format&lt;/span> &lt;span style="color:#66d9ef">t&lt;/span> &lt;span style="color:#e6db74">&amp;#34;~v,&amp;#39; D: new is ~A% faster than bellard~%&amp;#34;&lt;/span> n index
(&lt;span style="color:#a6e22e">round&lt;/span> (&lt;span style="color:#a6e22e">*&lt;/span> &lt;span style="color:#ae81ff">100&lt;/span> (&lt;span style="color:#a6e22e">-&lt;/span> bellard new)) bellard)))))
&lt;/code>&lt;/pre>&lt;/div>&lt;p>And the results if the timing up to the one millionth hexadecimal digit:&lt;/p>
&lt;pre tabindex="0">&lt;code>CL-USER&amp;gt; (compare-timings 7)
1 : new is 81% faster than bellard
10 : new is 7% faster than bellard
100 : new is 6% faster than bellard
1000 : new is 5% faster than bellard
10000 : new is 4% faster than bellard
100000 : new is 3% faster than bellard
1000000: new is 4% faster than bellard
&lt;/code>&lt;/pre>&lt;p>As predicted, though imperfect a test, it&amp;rsquo;s consistently faster across a few orders of magnitude.&lt;/p></description></item><item><title>The Clifford group as a permutation group</title><link>http://www.stylewarning.com/posts/clifford-permutation/</link><pubDate>Sun, 06 Jul 2025 00:00:00 +0000</pubDate><guid>http://www.stylewarning.com/posts/clifford-permutation/</guid><description>&lt;p>&lt;em>By Robert Smith&lt;/em>&lt;/p>
&lt;p>The Clifford group is an important mathematical group that is
foundational to the field of quantum error correction and quantum
benchmarking. I&amp;rsquo;ve long been interested in computing with the Clifford
group.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>I wrote a paper called &lt;em>The Computational Structure of the Clifford
Groups&lt;/em> which gives an implementation perspective of the math. The
paper can be found on &lt;a href="https://www.european-lisp-symposium.org/static/proceedings/2018.pdf">page
44&lt;/a>
of the proceedings of the 2018 European Lisp Symposium.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Much of that paper is implemented in
&lt;a href="https://github.com/quil-lang/quilc/tree/master/src/clifford">QUILC&lt;/a>. I
helped write the Common Lisp code for manipulating Clifford group
elements specifically for the purpose of randomized benchmarking.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The Clifford group is a foundational tool in QUILC&amp;rsquo;s &lt;a href="https://coalton-lang.github.io/20220906-quantum-compiler/">discrete
compiler&lt;/a>,
which was implemented in Coalton.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>I wrote a fast Clifford circuit simulator in the main implementation
of the quantum virtual machine, called the &lt;a href="https://github.com/quil-lang/qvm/blob/master/src/stabilizer-qvm.lisp">stabilizer
QVM&lt;/a>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>In lieu of actually defining the Clifford group (the &lt;em>Computational
Structure&lt;/em> paper does that fine if you&amp;rsquo;re interested), I&amp;rsquo;ll tell you
one interesting fact about it: It is, in some sense, the largest
collection of quantum operations you can fit together, before adding
&lt;em>any other&lt;/em> would make the collection computationally universal.&lt;/p>
&lt;p>One of the greatest discoveries of the 20th century was the
tractability of studying finite groups on a computer. Specifically,
two breakthrough algorithms laid the foundation for the field of
computational group theory:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The &lt;em>Todd&amp;ndash;Coxeter algorithm&lt;/em> solves coset enumeration of finitely
presented groups (i.e., groups that are specified by a set of
symbols and rewrite equations). It was discovered in the 1930s, and
it&amp;rsquo;s even feasible to do by hand in some cases.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For a permutatuion group $G$ generated by $\ell$ permutations of $n$
points, the &lt;em>Schreier&amp;ndash;Sims algorithm&lt;/em> produces a data structure for
$G$ in something like $O\big(n^2(\log\vert G\vert)^3+\ell n\log\vert
G\vert\big)$ time and $O(n^2\log\vert G\vert + \ell n)$ space. This
data structure lets us compute the size of $G$, determine whether
any permutation is an element of $G$, manufacture uniformly random
elements of $G$, and numerous other things.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>These algorithms led to a revolution, resulting in advanced
mathematical software like GAP&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup> which can be used to do truly
breathtaking group-theoretic calculations.&lt;/p>
&lt;p>Permutation groups remain the most practical and widely used algebraic
structure that admits feasible computation. My Common Lisp library
&lt;a href="https://github.com/stylewarning/cl-permutation">CL-PERMUTATION&lt;/a>
handles them nicely.&lt;/p>
&lt;p>The Clifford group, though, is usually constructively specified as a
matrix group generated by matrix and tensor products of a set of
generators. For example, let&lt;/p>
&lt;p>$$
\Gamma := \left\{
\begin{pmatrix}
1 &amp;amp; 0 \\
0 &amp;amp; 1
\end{pmatrix},
\begin{pmatrix}
1 &amp;amp; 0 \\
0 &amp;amp; i
\end{pmatrix},
\frac{1}{\sqrt{2}}
\begin{pmatrix}
1 &amp;amp; 1 \\
1 &amp;amp; -1
\end{pmatrix},
\begin{pmatrix}
1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 \\
0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 \\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 \\
0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0
\end{pmatrix}
\right\}.
$$&lt;/p>
&lt;p>Then the Clifford group on $n$ qubits is constructed as&lt;/p>
&lt;p>$$
C_n = \langle
g\in\Gamma^{\otimes n} \mid \dim g = 2^n
\rangle.
$$&lt;/p>
&lt;p>Such a construction is obtuse at best, and admits not much more than
ways to mechanically simulate, or treat the group as a black-box.&lt;/p>
&lt;p>Sometime, around 2018, I had a discussion my then-colleague &lt;a href="https://math.berkeley.edu/~hadfield/">Charles
Hadfield&lt;/a> about how we might be
able to leverage computational group theory to study the Clifford
group. Lazily, I had already Google&amp;rsquo;d around and wasn&amp;rsquo;t coming up with
anything. About a day later, Charles came back to me with an answer
that led me to have the biggest mathematical facepalm.&lt;/p>
&lt;p>The usual definition, which I avoided giving above, is full of
mathematical jargon about normalizers and quotients. Much simpler (and
of course essentially equivalent) is to see an element of the Clifford
group, say $g\in C_n$, as one that conjugates Pauli group ($P_n$)
elements to Pauli group elements. That is to say, $g\in C_n
\Leftrightarrow gP_ng^{-1} = P_n$.&lt;/p>
&lt;p>But this is none other than a permutation! It works as follows:&lt;/p>
&lt;ol>
&lt;li>Label each element of the Pauli group $P_n$ from $1$ to $\vert
P_n\vert=4^{n+1}$. Call them $(p_1, \ldots, p_{4^{n+1}})$.&lt;/li>
&lt;li>Compute the action of $p_j = g p_i g^{-1}$ for all $i$.&lt;/li>
&lt;li>The permutation encoding of $g$ is thus the product of the maps
$i\mapsto j$.&lt;/li>
&lt;/ol>
&lt;p>In plain English, enumerate the Paulis and look at where they go under
conjugation. Paired with your favorite Clifford group generators, you
now have an object suitable for the machinery of computational group
theory.&lt;/p>
&lt;p>This is of course an expensive representation of the Clifford group,
seeing as elements of $C_n$ need to be represented as arrays of
$4^{n+1}$ machine integers. We can easily reduce this to $2^{2n+1}-2$
by exploiting symmetries. In quantum computing, $n$ is usually not
very large anyway, usually $n&amp;lt;10$.&lt;/p>
&lt;p>This logic is implemented in &lt;a href="https://github.com/quil-lang/quilc/blob/master/src/clifford/perm.lisp">QUILC&lt;/a>. After downloading QUILC and CL-PERMUTATION, one can experiment.&lt;/p>
&lt;pre tabindex="0">&lt;code>$ sbcl
&amp;gt; (ql:quickload &amp;quot;cl-quil&amp;quot;)
&amp;gt; (in-package #:cl-quil.clifford)
&amp;gt; (defvar C4 (clifford-group-as-perm-group 4))
C4
&amp;gt; (perm:group-order C4)
12128668876800
&amp;gt; (perm:random-group-element C4)
#&amp;lt;PERM 277 278 381 382 103 104 445 446 168 167 192 191 469 470
472 471 205 206 166 165 431 432 102 101 368 367 280 279
14 13 142 141 407 408 496 495 230 229 303 304 37 38 77
78 344 343 342 341 64 63 39 40 318 317 231 232 510 509
405 406 127 128 352 351 118 117 30 29 264 263 221 222
455 456 416 415 182 181 183 184 429 430 453 454 207 208
261 262 15 16 119 120 365 366 493 494 247 248 144 143
389 390 79 80 325 326 302 301 55 56 53 54 287 288 327
328 94 93 392 391 158 157 245 246 480 479 209 210 451
452 428 427 185 186 364 363 121 122 18 17 260 259 266
265 28 27 115 116 354 353 180 179 417 418 457 458 219
220 91 92 329 330 290 289 52 51 481 482 244 243 156 155
393 394 388 387 146 145 250 249 492 491 57 58 300 299
324 323 81 82 433 434 164 163 204 203 474 473 11 12 282
281 369 370 99 100 105 106 379 380 276 275 1 2 467 468
193 194 170 169 443 444 316 315 42 41 66 65 340 339 129
130 403 404 507 508 233 234 228 227 497 498 409 410 140
139 345 346 76 75 35 36 306 305 211 212 450 449 425 426
187 188 361 362 123 124 20 19 257 258 268 267 25 26 114
113 356 355 177 178 419 420 459 460 218 217 90 89 331
332 292 291 49 50 483 484 241 242 153 154 395 396 385
386 148 147 252 251 489 490 59 60 297 298 321 322 83 84
435 436 161 162 201 202 476 475 10 9 284 283 371 372 98
97 107 108 378 377 273 274 3 4 466 465 195 196 172 171
442 441 313 314 44 43 68 67 337 338 131 132 402 401 506
505 235 236 225 226 499 500 411 412 137 138 347 348 73
74 34 33 308 307 5 6 271 272 375 376 109 110 439 440
174 173 198 197 463 464 478 477 199 200 160 159 437 438
96 95 374 373 286 285 8 7 136 135 413 414 502 501 224
223 309 310 31 32 71 72 350 349 336 335 70 69 45 46 312
311 237 238 504 503 399 400 133 134 358 357 112 111 24
23 270 269 215 216 461 462 422 421 176 175 189 190 423
424 447 448 213 214 255 256 21 22 125 126 359 360 487
488 253 254 150 149 383 384 85 86 319 320 296 295 61 62
47 48 293 294 333 334 88 87 398 397 152 151 239 240 486
485&amp;gt;
&amp;gt; (perm:to-cycles * :canonicalizep nil)
(#&amp;lt;CYCLE (292 291)*&amp;gt;
#&amp;lt;CYCLE (173 393 174 394)*&amp;gt;
#&amp;lt;CYCLE (411 286 217 193 164 289 331 371)*&amp;gt;
#&amp;lt;CYCLE (122 157 219 170 243 409 374 138)*&amp;gt;
#&amp;lt;CYCLE (412 285 218 194 163 290 332 372)*&amp;gt;
#&amp;lt;CYCLE (121 158 220 169 244 410 373 137)*&amp;gt;
#&amp;lt;CYCLE (459 422 223 316 322 162 330 283)*&amp;gt;
#&amp;lt;CYCLE (92 120 391 439 237 233 403 160)*&amp;gt;
#&amp;lt;CYCLE (460 421 224 315 321 161 329 284)*&amp;gt;
#&amp;lt;CYCLE (91 119 392 440 238 234 404 159)*&amp;gt;
#&amp;lt;CYCLE (461 176 387 375 347 172 155 457)*&amp;gt;
#&amp;lt;CYCLE (59 405 437 312 60 406 438 311)*&amp;gt;
#&amp;lt;CYCLE (462 175 388 376 348 171 156 458)*&amp;gt;
#&amp;lt;CYCLE (54 317 83 453 270 258 449 112)*&amp;gt;
#&amp;lt;CYCLE (467 447 358 338 377 73 455 215)*&amp;gt;
#&amp;lt;CYCLE (53 318 84 454 269 257 450 111)*&amp;gt;
#&amp;lt;CYCLE (468 448 357 337 378 74 456 216)*&amp;gt;
#&amp;lt;CYCLE (32 141 260 426)*&amp;gt;
#&amp;lt;CYCLE (477 359 131 428 72 222 444 400)*&amp;gt;
#&amp;lt;CYCLE (31 142 259 425)*&amp;gt;
#&amp;lt;CYCLE (478 360 132 427 71 221 443 399)*&amp;gt;
#&amp;lt;CYCLE (27 280 178 145 28 279 177 146)*&amp;gt;
#&amp;lt;CYCLE (487 85 207 105 325 476 126 479)*&amp;gt;
#&amp;lt;CYCLE (26 367 225 42 38 229 340 274)*&amp;gt;
#&amp;lt;CYCLE (488 86 208 106 326 475 125 480)*&amp;gt;
#&amp;lt;CYCLE (25 368 226 41 37 230 339 273)*&amp;gt;
#&amp;lt;CYCLE (489 319 435 45 344 465 423 309)*&amp;gt;
#&amp;lt;CYCLE (24 101 389 109 55 231 129 451)*&amp;gt;
#&amp;lt;CYCLE (490 320 436 46 343 466 424 310)*&amp;gt;
#&amp;lt;CYCLE (23 102 390 110 56 232 130 452)*&amp;gt;
#&amp;lt;CYCLE (491 296 484 149 354 43 77 182)*&amp;gt;
#&amp;lt;CYCLE (22 432 335 107 302 396 197 474)*&amp;gt;
#&amp;lt;CYCLE (492 295 483 150 353 44 78 181)*&amp;gt;
#&amp;lt;CYCLE (21 431 336 108 301 395 198 473)*&amp;gt;
#&amp;lt;CYCLE (493 61 127 209 379 34 408 95)*&amp;gt;
#&amp;lt;CYCLE (20 165 52 40 304 386 272 267)*&amp;gt;
#&amp;lt;CYCLE (494 62 128 210 380 33 407 96)*&amp;gt;
#&amp;lt;CYCLE (19 166 51 39 303 385 271 268)*&amp;gt;
#&amp;lt;CYCLE (497 293 49 64 351 313 297 241)*&amp;gt;
#&amp;lt;CYCLE (18 206 100 143 266 124 246 139)*&amp;gt;
#&amp;lt;CYCLE (498 294 50 63 352 314 298 242)*&amp;gt;
#&amp;lt;CYCLE (17 205 99 144 265 123 245 140)*&amp;gt;
#&amp;lt;CYCLE (499 333 98 248 346 196 203 369)*&amp;gt;
#&amp;lt;CYCLE (16 471 255 211 276 113 287 90)*&amp;gt;
#&amp;lt;CYCLE (500 334 97 247 345 195 204 370)*&amp;gt;
#&amp;lt;CYCLE (15 472 256 212 275 114 288 89)*&amp;gt;
#&amp;lt;CYCLE (501 88 262 188 323 201 282 420)*&amp;gt;
#&amp;lt;CYCLE (12 191 433 70 263 361 402 200)*&amp;gt;
#&amp;lt;CYCLE (502 87 261 187 324 202 281 419)*&amp;gt;
#&amp;lt;CYCLE (11 192 434 69 264 362 401 199)*&amp;gt;
#&amp;lt;CYCLE (503 398 464 190 82 430 349 442)*&amp;gt;
#&amp;lt;CYCLE (10 167 481 253 306 147 115 327)*&amp;gt;
#&amp;lt;CYCLE (504 397 463 189 81 429 350 441)*&amp;gt;
#&amp;lt;CYCLE (9 168 482 254 305 148 116 328)*&amp;gt;
#&amp;lt;CYCLE (505 152 179 250 75 416 135 364)*&amp;gt;
#&amp;lt;CYCLE (8 446 134 186 299 153 417 413)*&amp;gt;
#&amp;lt;CYCLE (506 151 180 249 76 415 136 363)*&amp;gt;
#&amp;lt;CYCLE (7 445 133 185 300 154 418 414)*&amp;gt;
#&amp;lt;CYCLE (507 239 228 65 118 93 365 235)*&amp;gt;
#&amp;lt;CYCLE (4 382 307 252 36 495 47 342)*&amp;gt;
#&amp;lt;CYCLE (508 240 227 66 117 94 366 236)*&amp;gt;
#&amp;lt;CYCLE (3 381 308 251 35 496 48 341)*&amp;gt;
#&amp;lt;CYCLE (509 486 384 6 104 80 184 58)*&amp;gt;
#&amp;lt;CYCLE (2 278 355 68 29 14 470 214)*&amp;gt;
#&amp;lt;CYCLE (510 485 383 5 103 79 183 57)*&amp;gt;
#&amp;lt;CYCLE (1 277 356 67 30 13 469 213)*&amp;gt;)
&lt;/code>&lt;/pre>&lt;p>The cool thing is that these operations all happen basically
instantaneously.&lt;/p>
&lt;p>P.S. This insight is easily achieved by asking an LLM. Who
needs mathematicians anymore? :-)&lt;/p>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>Unfortunately, like a lot of mathematical software born in and around the 1980s, it became its own bespoke, awful, underspecified, imperative programming language with a variety of algebraic APIs. The system can be learned about from its &lt;a href="https://www.gap-system.org">website&lt;/a>.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>The best way to advertise a programming language</title><link>http://www.stylewarning.com/posts/write-programs/</link><pubDate>Sat, 05 Jul 2025 00:00:00 +0000</pubDate><guid>http://www.stylewarning.com/posts/write-programs/</guid><description>&lt;p>&lt;em>By Robert Smith&lt;/em>&lt;/p>
&lt;p>TL;DR: The best way to advertise your favorite programming language is by
writing programs. The more useful the program is to a wider audience,
the better advertisement it will be.&lt;/p>
&lt;p>I&amp;rsquo;m a Common Lisp/&lt;a href="https://coalton-lang.github.io">Coalton&lt;/a>
programmer. Lisp was historically at least twice fashionable, and it&amp;rsquo;s
not coincidental that it was fashionable during times people were
writing Lisp code to actually get work done. It&amp;rsquo;s not fashionable
today, but it does have a not-insignificant following and a stable
ecosystem. Because Lisp isn&amp;rsquo;t fashionable, some of its programmers
have felt an impetus to sell Lisp to the wider audience of
programmers. The Lisp sales pitches range from the reasonable (&amp;ldquo;Lisp
is fast, flexible, and stable.&amp;quot;) to the misleading (&amp;ldquo;Lisp can express
every programming paradigm easily.&amp;quot;) to the outright bizarre (&amp;ldquo;Lisp is
your gateway to a higher intellectual plane.&amp;quot;). I do think there&amp;rsquo;s
still room to make an interesting and convincing pitch for Lisp in
2025&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>, but any pitch will eventually be rebuffed with:&lt;/p>
&lt;blockquote>
&lt;p>If Lisp is so good, why isn&amp;rsquo;t everybody using it? Where are all the
programs written in Lisp?&lt;/p>
&lt;/blockquote>
&lt;p>The common counter-riposte is something like &amp;ldquo;popularity ≠ quality&amp;rdquo;, a
platitude if there ever was one.&lt;/p>
&lt;p>I&amp;rsquo;m saving a deeper discussion about Lisp specifically for a different
post&amp;mdash;one that&amp;rsquo;s still brewing and is at 3,700 words and
counting. But Lisp isn&amp;rsquo;t alone here, and it&amp;rsquo;s not even the most
defended language in certain corners of the internet. Haskell is
another example of a language meeting the same demise of
argument. Haskell, for all of its fantastical progress in its 35 years
of existence, for its sizable group of staunch disciples, and for its
amazing compiler, only maintains some 0.4%&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup> of GitHub&amp;rsquo;s active
users.&lt;/p>
&lt;p>This post isn&amp;rsquo;t intended to be about Common Lisp or Haskell
specifically, but they are useful specimens for inquiry. Both Common
Lisp and Haskell have existed for a technological eternity, and I
think it&amp;rsquo;s reasonable to examine the question, &amp;ldquo;where are all the
programs?&amp;rdquo; For this exercise, I&amp;rsquo;ll look at GitHub&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>. I want to
look for all projects which satisfy the following four criteria:&lt;/p>
&lt;ol>
&lt;li>The project should have no fewer than 10% of the stars of the top
repository. For Common Lisp that threshold is 1,000 and for Haskell
it&amp;rsquo;s 3,800.&lt;/li>
&lt;li>The project should represent a software product whose users don&amp;rsquo;t
need to know the language it&amp;rsquo;s written in. That means, among other
things, no libraries.&lt;/li>
&lt;li>The project should represent something realistically useful and not
experimental in nature. That means, among other things, no
operating systems, obscure programming languages, etc.&lt;/li>
&lt;li>No defunct, archived, or archaeological projects. For example,
Reddit 1.0 was written in Lisp but it&amp;rsquo;s not used anywhere anymore.&lt;/li>
&lt;/ol>
&lt;p>Here&amp;rsquo;s what I got for Common Lisp:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Nyxt&lt;/strong>, a keyboard-driven graphical web browser over WebKit.&lt;/li>
&lt;li>&lt;strong>pgloader&lt;/strong>, a PostgreSQL migration tool.&lt;/li>
&lt;li>&lt;strong>Trial&lt;/strong>, a game engine in Common Lisp. (This breaks my Rule #2,
however, because Trial was used to ship the video game
&lt;a href="https://store.steampowered.com/app/1261430/Kandria/">Kandria&lt;/a> on
Steam, I&amp;rsquo;ll use it as a proxy for Kandria.)&lt;/li>
&lt;li>&lt;strong>Maxima&lt;/strong>, a computer algebra system. (Maxima isn&amp;rsquo;t actually on
GitHub, so consider it listed solely on my grace to make this list
look a little less pitiful.)&lt;/li>
&lt;/ol>
&lt;p>Here&amp;rsquo;s what I got for Haskell:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>pandoc&lt;/strong>, the universal markup converter.&lt;/li>
&lt;li>&lt;strong>ShellCheck&lt;/strong>, a static analyzer for shell scripts.&lt;/li>
&lt;li>&lt;strong>PostgREST&lt;/strong>, a REST API for PostgreSQL databases.&lt;/li>
&lt;li>&lt;strong>hadolint&lt;/strong>, a Dockerfile linter.&lt;/li>
&lt;li>&lt;strong>PureScript&lt;/strong>/&lt;strong>Elm&lt;/strong>, programming languages adjacent to
JavaScript. (These barely skate by my Rule #2.)&lt;/li>
&lt;li>&lt;strong>Unison&lt;/strong>/&lt;strong>Carp&lt;/strong>, more programming languages. (These barely skate by my
Rule #3.)&lt;/li>
&lt;li>&lt;strong>xmonad&lt;/strong>/&lt;strong>kmonad&lt;/strong>, a window/keyboard manager.&lt;/li>
&lt;li>&lt;strong>duckling&lt;/strong>, an engine for data validation.&lt;/li>
&lt;/ol>
&lt;p>For the sake of comparison, for Python:&lt;/p>
&lt;ul>
&lt;li>The top repository has 362,000 stars.&lt;/li>
&lt;li>&lt;strong>youtube-dl&lt;/strong>/&lt;strong>yt-dlp&lt;/strong> is the top &amp;ldquo;normie&amp;rdquo; program. This is
extremely important because such a program could have been written
in &lt;em>any&lt;/em> language &lt;em>easily&lt;/em>.&lt;/li>
&lt;li>There are 11 pages of results that exceed my self-imposed 10%
threshold of 36,200 stars.&lt;/li>
&lt;/ul>
&lt;p>And for Zig, whose market share is a mere rounding error of Python&amp;rsquo;s,
almost all of the projects in the top listing are actual programs
people can use, such as &lt;strong>Bun&lt;/strong>, &lt;strong>Ghostty&lt;/strong>, &lt;strong>Tigerbeetle&lt;/strong>,
&lt;strong>Lightpanda&lt;/strong>, and &lt;strong>dockerc&lt;/strong>. The first page of Zig&amp;rsquo;s results alone
have far more programs than the whole of Common Lisp&amp;rsquo;s
corpus under consideration. In contrast to Haskell, it&amp;rsquo;s not full of linters and
compilers, which appear to be a Haskell programmer&amp;rsquo;s favorite.&lt;/p>
&lt;p>GitHub and stars are an imperfect indicator. Why shouldn&amp;rsquo;t my terminal
program for simulating Conway&amp;rsquo;s Game of Life be counted among the
corpus of Lisp programs? It&amp;rsquo;s neither discoverable nor
useful&lt;sup id="fnref:4">&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref">4&lt;/a>&lt;/sup>. It was a proof-of-concept hack that was never brought
to a logical conclusion. I feel similar sentiments toward the
multitude of other extremely niche programs.&lt;/p>
&lt;p>All of this is to say that there really is a dearth of programs that
one can easily find in languages like Lisp or Haskell, which means
Lisp and Haskell are relegated to being spoken about exclusively in
terms of its hypothetical&amp;mdash;or perhaps historical or
mathematical&amp;mdash;benefits. The idea of a &lt;em>practical&lt;/em> benefit is not one
that&amp;rsquo;s simply ergonomic or realizable&lt;sup id="fnref:5">&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref">5&lt;/a>&lt;/sup>, but one that was
observed or is gained in actual practice.&lt;/p>
&lt;p>Why would real programs used by other people even be a good
advertisement? Shipping software means you had to cross the finish
line. It means that the entirety of the software development process
had to be realized, not just the intellectually stimulating 80%
part. Even bigger than this, though, is a sense of practicing what you
preach. If you&amp;rsquo;re on Twitter slamming software written in Go and
advocating for Haskell instead, but having no interesting or useful
Haskell programs to show, then why would anybody believe you? If
Haskell were so good, why aren&amp;rsquo;t there more programs written in it?&lt;/p>
&lt;p>If you should convince other people to use your favorite programming
language, you should first convince yourself to write programs in it.&lt;/p>
&lt;p>Language evangelism, adoption, and popularity are complex. It&amp;rsquo;s
difficult to compete with marketing, Google budgets, institutionalized
education, and so on. Writing programs won&amp;rsquo;t guarantee you&amp;rsquo;ll rope
anybody in to writing in your favorite language. But, all else equal,
as an individual interested in promoting your language, it&amp;rsquo;s probably
your best shot.&lt;/p>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>Such a pitch would have to include a lot of things people care about in 2025, such as the developer experience, performance, and the truly can&amp;rsquo;t-be-done-elsewhere things illuminated by superlative examples.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>Data taken from &lt;a href="https://www.benfrederickson.com/ranking-programming-languages-by-github-users/">Ben Federickson&amp;rsquo;s blog&lt;/a>.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>Two points about GitHub. First, it&amp;rsquo;s indisputably the most popular place for open-source projects, but it&amp;rsquo;s not the only place. Sourceforge, of all places, still hosts software that&amp;rsquo;s used today. Second, if the code isn&amp;rsquo;t open source, who cares what language it&amp;rsquo;s written in? The only times the choice of a programming language for a closed source project matters is if (1) you want some evidence that the software is secure (e.g., software written in C probably has vulnerabilities), (2) you want a job working on such a project (e.g., trading software in OCaml at Jane Street), or (3) you want the warm and fuzzies that a company put their chips in on it.&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:4" role="doc-endnote">
&lt;p>Compare to &lt;a href="https://golly.sourceforge.io">Golly&lt;/a>, which is an amazing program that&amp;rsquo;s also distributed on both Apple&amp;rsquo;s App Store and Google&amp;rsquo;s Play Store.&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:5" role="doc-endnote">
&lt;p>Annoyingly, &amp;ldquo;practical&amp;rdquo; also means &amp;ldquo;feasible&amp;rdquo; or &amp;ldquo;possible&amp;rdquo;, and theoreticians lean into this definition too much.&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>A tutorial quantum interpreter in 150 lines of Lisp</title><link>http://www.stylewarning.com/posts/quantum-interpreter/</link><pubDate>Sun, 16 Jul 2023 00:00:00 +0000</pubDate><guid>http://www.stylewarning.com/posts/quantum-interpreter/</guid><description>&lt;p>&lt;em>By Robert Smith&lt;/em>&lt;/p>
&lt;p>&lt;em>Simulating a universal, gate-based quantum computer on a classical
computer has many uses and benefits. The top benefit is the ability to
inspect the amplitudes of the system&amp;rsquo;s state directly. However, while
the mathematics is very well understood, implementing a
general-purpose simulator has largely been folk knowledge. In this
tutorial, we show how to build an interpreter for a general-purpose
quantum programming language called $\mathscr{L}$, capable of
executing most kinds of quantum circuits found in literature. It is
presented economically, allowing its implementation to take fewer than
150 lines of self-contained Common Lisp code. The language
$\mathscr{L}$ is very simple to extend, making the interpreter ripe
for testing different kinds of behavior, such as noise models.&lt;/em>&lt;/p>
&lt;div>
&lt;hr>
&lt;h2>Contents&lt;/h2>
&lt;nav id="TableOfContents">
&lt;ol>
&lt;li>&lt;a href="#introduction">Introduction&lt;/a>
&lt;ol>
&lt;li>&lt;a href="#a-note-about-common-lisp">A note about Common Lisp&lt;/a>&lt;/li>
&lt;li>&lt;a href="#a-note-to-experienced-quantum-computing-practitioners">A note to experienced quantum computing practitioners&lt;/a>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>&lt;a href="#the-language-mathscrl">The Language $\mathscr{L}$&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-quantum-state">The Quantum State&lt;/a>
&lt;ol>
&lt;li>&lt;a href="#where-does-one-qubit-live">Where does one qubit live?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#many-qubits">Many qubits&lt;/a>&lt;/li>
&lt;li>&lt;a href="#bit-string-notation-and-a-general-quantum-state">Bit-String notation and a general quantum state&lt;/a>&lt;/li>
&lt;li>&lt;a href="#evolving-the-quantum-state">Evolving the quantum state&lt;/a>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>&lt;a href="#measurement">Measurement&lt;/a>&lt;/li>
&lt;li>&lt;a href="#gates">Gates&lt;/a>
&lt;ol>
&lt;li>&lt;a href="#gates-as-matrices">Gates as matrices&lt;/a>&lt;/li>
&lt;li>&lt;a href="#gates-on-multi-qubit-machines">Gates on multi-qubit machines&lt;/a>&lt;/li>
&lt;li>&lt;a href="#single-qubit-gates-and-gates-on-adjacent-qubits">Single-qubit gates and gates on adjacent qubits&lt;/a>&lt;/li>
&lt;li>&lt;a href="#multi-qubit-gates-on-non-adjacent-qubits">Multi-qubit gates on non-adjacent qubits&lt;/a>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>&lt;a href="#an-interpreter">An interpreter&lt;/a>
&lt;ol>
&lt;li>&lt;a href="#the-driver-loop">The driver loop&lt;/a>&lt;/li>
&lt;li>&lt;a href="#efficiency">Efficiency&lt;/a>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>&lt;a href="#examples">Examples&lt;/a>
&lt;ol>
&lt;li>&lt;a href="#bell-state">Bell state&lt;/a>&lt;/li>
&lt;li>&lt;a href="#greenberger--horne--zeilinger-state">Greenberger&amp;ndash;Horne&amp;ndash;Zeilinger state&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-quantum-fourier-transform">The quantum Fourier transform&lt;/a>&lt;/li>
&lt;li>&lt;a href="#example-transcript">Example transcript&lt;/a>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>&lt;a href="#source-code">Source code&lt;/a>&lt;/li>
&lt;li>&lt;a href="#ports-in-other-languages">Ports in other languages&lt;/a>&lt;/li>
&lt;/ol>
&lt;/nav>
&lt;hr>
&lt;/div>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Simulating the workings of an ideal quantum computer has many
important applications, such as algorithms research and quantum
program debugging. A variety of quantum computer simulators exist,
both free and commercial. However, while the concept of the simulation
of quantum computers is generally well understood at a high level, the
devil is in the details when it comes to implementation.&lt;/p>
&lt;p>Quantum computer simulators found in the wild often have many
limitations. The most prevalent limitation is the number of qubits an
operator can act on. Usually, one-qubit gates and controlled
one-qubit&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup> gates are allowed, but nothing more. While these
together are sufficient for universal quantum computation, it leaves
much to be desired when studying quantum algorithms.&lt;/p>
&lt;p>In this post, we present an implementation of a fully general quantum
programming language interpreter, allowing measurement as well as
arbitrary unitary operators on an arbitrary number of arbitrarily
indexed qubits. The implementation weighs in at under 150 lines&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>
of code in Common Lisp, though the ideas make implementation simple in
other languages as well. All of the code from this tutorial can be
found on
&lt;a href="https://github.com/stylewarning/quantum-interpreter">GitHub&lt;/a>.&lt;/p>
&lt;p>This tutorial is aimed at a quantum computing beginner who has some
familiarity with the fundamentals of linear algebra and computer
programming. Beyond those subjects, this tutorial is relatively
self-contained. We also aim this tutorial at practitioners of quantum
computing, who are interested in the brass tacks of simulation, with
all of the details filled out. To such practitioners, the bulk of this
document will be easy to skim, since we recapitulate topics such as
qubits and unitary operators.&lt;/p>
&lt;h3 id="a-note-about-common-lisp">A note about Common Lisp&lt;/h3>
&lt;p>We use Common Lisp, because it is an excellent platform for both
exploratory and high-performance computing. One of the fastest and
most flexible quantum simulators out there, the &lt;a href="https://github.com/quil-lang/qvm">Quantum Virtual
Machine&lt;/a>, is written entirely in
Common Lisp.&lt;/p>
&lt;p>We wrote this article so that it would be easy to follow along with a
Common Lisp implementation. The code has no dependencies, and should
work in any ANSI-compliant implementation (I hope).&lt;/p>
&lt;p>With that said, this article was also written with portability in
mind. Since no especially Lisp-like features are used, the code should
be easy to port to Python or even C. At minimum, your language should
support complex numbers and arrays.&lt;/p>
&lt;h3 id="a-note-to-experienced-quantum-computing-practitioners">A note to experienced quantum computing practitioners&lt;/h3>
&lt;p>&lt;em>This section is written for experienced practitioners of quantum
computing who happened upon this post, and can be skipped.&lt;/em>&lt;/p>
&lt;p>In this post, we opt to simulate a quantum circuit the &amp;ldquo;Schrodinger&amp;rdquo;
way, that is, by evolving a wavefunction explicitly. For a circuit of
width $n$, we walk through the mathematics of how to interpret a
$k$-qubit gate $g \in \mathsf{SU}(2^k)$ for $k\le n$, specified to act
on a $k$-tuple of numbered qubits&amp;mdash;corresponding to each qubit&amp;rsquo;s
position in the tensor product which forms the Hilbert space of the
system&amp;mdash;as a full operator $g'\in\mathsf{SU}(2^n)$. We do this by
providing an explicit construction of the matrix in the computational
basis of the system.&lt;/p>
&lt;p>An alternative approach would have been to describe the action of a
$g$ on an $n$-qubit wavefunction by way of careful manipulation of
indexes, i.e., to effectively permute and partition our wavefunction
into $2^{n-k}$ groups of $2^k$-dimensional vectors corresponding to
the subsystem of qubits being operated on. The major benefit of this
approach is efficiency.&lt;/p>
&lt;p>As a first introduction to a computer science graduate, I find this
explanation lacking in two ways:&lt;/p>
&lt;ol>
&lt;li>It under-emphasizes that a gate like $\mathsf{CNOT}$, typically
written as a $4\times 4$ matrix $\mathsf{I}\oplus\mathsf{X}$, in a
quantum circuit truly is a linear operator on the Hilbert space of
the entire system. &amp;ldquo;It&amp;rsquo;s just linear algebra; here&amp;rsquo;s the matrix and
here&amp;rsquo;s the vector&amp;rdquo; is a point I want to drive home.&lt;/li>
&lt;li>It requires significant labor to both explain and prove the
correctness of the method, without having significant experience with
tensor algebra, contractions, Einstein notation, and so on.&lt;/li>
&lt;/ol>
&lt;p>The approach of this post can be used as a basis to follow up with
more efficient techniques, without relinquishing a strong mathematical
foundation. We are very careful to not be hand-wavy, and to not
conflate the different vector spaces at play. We hope that you&amp;rsquo;ll find
this approach agreeable, even if it sacrifices some efficiency.&lt;/p>
&lt;h2 id="the-language-mathscrl">The Language $\mathscr{L}$&lt;/h2>
&lt;p>We wish to construct an interpreter for a small quantum programming
language named $\mathscr{L}$. This language supports
both of the fundamental operations of a quantum computer: gates and
measurements.&lt;/p>
&lt;p>A &lt;strong>gate&lt;/strong> is an operation that modifies a quantum state. (What a
quantum state is exactly we will delve into later.) Because quantum
states are large compared to the physical resources used to construct
them, gates represent the &amp;ldquo;powerful&amp;rdquo; operations of a quantum
computer.&lt;/p>
&lt;p>A &lt;strong>measurement&lt;/strong> is an observation and collapse of the quantum state,
producing one bit (i.e., $0$ or $1$) of classical information per
qubit. Measurements represent the &lt;em>only&lt;/em> way in which one can extract
information from our simulated quantum computer, and indeed, in most
programming models for real quantum computers.&lt;/p>
&lt;p>In some sense, one might think of the language $\mathscr{L}$ as the
simplest non-trivial quantum programming language. A program in
$\mathscr{L}$ is just a sequence of gates and measurements. The syntax
is as follows:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">Non-Terminal&lt;/th>
&lt;th style="text-align:right">&lt;/th>
&lt;th style="text-align:left">Defintion&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">program&lt;/td>
&lt;td style="text-align:right">:=&lt;/td>
&lt;td style="text-align:left">&lt;code>(&lt;/code> &lt;em>instruction&lt;/em>* &lt;code>)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">instruction&lt;/td>
&lt;td style="text-align:right">:=&lt;/td>
&lt;td style="text-align:left">&lt;code>(&lt;/code> &lt;code>GATE&lt;/code> &lt;em>matrix&lt;/em> &lt;em>qubit&lt;/em>+ &lt;code>)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td style="text-align:right">|&lt;/td>
&lt;td style="text-align:left">&lt;code>(&lt;/code> &lt;code>MEASURE&lt;/code> &lt;code>)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">matrix&lt;/td>
&lt;td style="text-align:right">:=&lt;/td>
&lt;td style="text-align:left">&lt;em>a complex matrix&lt;/em> &lt;code>#2A(&lt;/code> &amp;hellip; &lt;code>)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">qubit&lt;/td>
&lt;td style="text-align:right">:=&lt;/td>
&lt;td style="text-align:left">&lt;em>a non-negative integer&lt;/em>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Spaces and newlines are ignored, except to delimit the tokens of our
language.&lt;/p>
&lt;p>We borrow Common Lisp&amp;rsquo;s two-dimensional array syntax for the syntax of
matrices. In Common Lisp, the matrix $\left(\begin{smallmatrix}1 &amp;amp;
2\\3 &amp;amp; 4\end{smallmatrix}\right)$ is written &lt;code>#2A((1 2) (3 4))&lt;/code>. We
also borrow the syntax for complex numbers: $1-2i$ is written &lt;code>#C(1 -2)&lt;/code>.&lt;/p>
&lt;p>An example program might be one to construct and subsequently measure
two qubits labeled &lt;code>2&lt;/code> and &lt;code>5&lt;/code> in a Bell state configuration:&lt;/p>
&lt;pre tabindex="0">&lt;code>(
(GATE #2A((0.70710677 0.70710677) (0.70710677 -0.70710677)) 2)
(GATE #2A((1 0 0 0) (0 1 0 0) (0 0 0 1) (0 0 1 0)) 2 5)
(MEASURE)
)
&lt;/code>&lt;/pre>&lt;p>We will model the semantics of $\mathscr{L}$ operationally, by way of an &lt;strong>abstract machine&lt;/strong>. The abstract machine for $\mathscr{L}$ is called $M_n$, where $n$ is a positive but fixed&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup> number of qubits. The state of the machine $M_n$ is the pair $(v, b)$ where $v$ is a quantum state, and $b$ is an $n$-bit measurement register.&lt;/p>
&lt;p>The quantum state is an element of the set&lt;/p>
&lt;p>$$\{\Vert v\Vert=1\mid v\in\mathbb{C}^{2^n}\}.$$&lt;/p>
&lt;p>In other words, $v$ is a unit vector of dimension $2^n$ over the
complex numbers. We will discuss this from first principles in the
&lt;a href="#the-quantum-state">next section&lt;/a>.&lt;/p>
&lt;p>The measurement register is an element of the set $\{0,1\}^n$, i.e.,
a sequence of $n$ bits, which we realize as a non-negative
integer. The $k$th least-significant bit of this integer represents
the last observation of the qubit numbered as $k$. We will &lt;a href="#measurement">discuss
this in detail&lt;/a> as well.&lt;/p>
&lt;p>In Common Lisp, it suffices to create a structure &lt;code>machine&lt;/code> which holds these two pieces of state.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defstruct machine
quantum-state
measurement-register)
&lt;/code>&lt;/pre>&lt;p>Typically, the machine is initialized with each classical bit in the
measurement register $0$, and each qubit starting in the
zero-state. (However, for the purposes of algorithm study or
debugging, the machine may be initialized with any valid state.)&lt;/p>
&lt;p>The precise way in which the language $\mathscr{L}$ is interpreted on
$M_n$ is what we describe in this tutorial. Before that, however, we
find it most important to describe what &lt;em>exactly&lt;/em> a quantum state is,
and how to represent it on a computer.&lt;/p>
&lt;h2 id="the-quantum-state">The Quantum State&lt;/h2>
&lt;h3 id="where-does-one-qubit-live">Where does one qubit live?&lt;/h3>
&lt;p>Quantum computers are usually just a collection of interacting computational elements called &lt;strong>qubits&lt;/strong>. A single qubit has two distinguished states: $\ket{0}$ and $\ket{1}$. If the qubit has a name like $q$, then we label the states $\ket{0}_q$ and $\ket{1}_q$.&lt;/p>
&lt;p>The funny notation is called &lt;strong>Dirac notation&lt;/strong> or &lt;strong>braket notation&lt;/strong>. It happens to be a convenient notation for doing calculations in quantum mechanics, and we just use it for consistency with other texts. The &lt;strong>ket&lt;/strong> $\ket{\cdots}$, as a physicist would call it, doesn&amp;rsquo;t add any special significance, except to denote that the quantity is a vector. One can actually put &lt;em>anything&lt;/em> inside the brackets. In usual linear algebra, one often writes $\mathbf{e}_i$ to denote a basis vector, where in quantum mechanics, one just writes the subscript in a ket $\ket{i}$, dropping the $\mathbf{e}$ entirely. If the notation throws you off, and you&amp;rsquo;d like to think in more traditional written linear algebra notation, you can always replace $\ket{x}$ with $\vec x$, and you&amp;rsquo;ll be safe.&lt;/p>
&lt;p>These distinguished states $\ket{0}$ and $\ket{1}$ are understood to be orthonormal basis vectors in a vector space whose scalars are complex numbers $\mathbb{C}$. As such, a qubit can be $\ket{0}$, $\ket{1}$, or a &lt;strong>superposition&lt;/strong> $\alpha\ket 0 + \beta\ket 1$, where $\alpha$ and $\beta$ are complex numbers. The numbers $\alpha$ and $\beta$ are called &lt;strong>probability amplitudes&lt;/strong>, because $\vert\alpha\vert^2$ (resp. $\vert\beta\vert^2$) represent the probability of the qubit being observed in the $\ket 0$ (resp. $\ket 1$) state. Since they represent probabilities, there&amp;rsquo;s an additional constraint, namely that the probabilities add to one: $\vert\alpha\vert^2 + \vert\beta\vert^2=1$.&lt;/p>
&lt;p>To those unfamiliar, it may not be obvious why we&amp;rsquo;ve opted to use the
language of linear algebra. Why do we consider a qubit as being a
linear combination? Why do we suppose that the observable states are
orthonormal vectors? Why can&amp;rsquo;t we simply say that a qubit is just a
pair of complex numbers and move on?&lt;/p>
&lt;p>The reason for this is scientific, and not mathematical. It turns out that the best theory of quantum mechanics we have is one which describes transformations between states as being linear. In fact, the evolution of a quantum mechanical system is not only described by an operation that is just linear, but also reversible. These conditions&amp;mdash;linear, reversible, and length-preserving&amp;mdash;give rise to a special class of transformations called &lt;strong>unitary operators&lt;/strong>, which naturally lead us to the discussion of vector spaces over complex numbers&lt;sup id="fnref:4">&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref">4&lt;/a>&lt;/sup>.&lt;/p>
&lt;p>We will discuss the nature of these operations in more depth when we consider how to implement gates &lt;a href="#gates">later on&lt;/a>. For now, however, it&amp;rsquo;s sufficient to think of a qubit named $q$ as something that lives in a complex, two-dimensional vector space, which we will call $$B_q := \operatorname{span}_{\mathbb{C}}\{\ket 0_q, \ket 1_q\}.$$ (We will use this $B_q$ notation a few times throughout this tutorial. Remember it!) We also understand that this space is equipped&lt;sup id="fnref:5">&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref">5&lt;/a>&lt;/sup> with a way to calculate lengths of vectors&amp;mdash;the usual norm&lt;/p>
&lt;p>$$
\left\Vert\alpha\ket{0}+\beta\ket{1}\right\Vert = \sqrt{\vert\alpha\vert^2+\vert\beta\vert^2}.
$$&lt;/p>
&lt;h3 id="many-qubits">Many qubits&lt;/h3>
&lt;p>Roughly speaking, a single qubit can be described by two
probabilities. How do we deal with more?&lt;/p>
&lt;p>Suppose we have two qubits named $X$ and $Y$. As a pair, quantum
mechanics tells us that they can &lt;em>interact&lt;/em>. Practically, what
that means is that their states can be correlated in some way. If
they&amp;rsquo;ve interacted, knowing information about $X$ might give us a clue
about what $Y$ might be. One well-known example of this is the
&lt;em>Bell state&lt;/em>, which can be summarized as follows:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">Qubit $X$&lt;/th>
&lt;th style="text-align:center">Qubit $Y$&lt;/th>
&lt;th style="text-align:center">Prob. Amp.&lt;/th>
&lt;th style="text-align:center">Probability&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">$\ket 0_X$&lt;/td>
&lt;td style="text-align:center">$\ket 0_Y$&lt;/td>
&lt;td style="text-align:center">$1/\sqrt{2}$&lt;/td>
&lt;td style="text-align:center">$50\%$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">$\ket 0_X$&lt;/td>
&lt;td style="text-align:center">$\ket 1_Y$&lt;/td>
&lt;td style="text-align:center">$0$&lt;/td>
&lt;td style="text-align:center">$0\%$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">$\ket 1_X$&lt;/td>
&lt;td style="text-align:center">$\ket 0_Y$&lt;/td>
&lt;td style="text-align:center">$0$&lt;/td>
&lt;td style="text-align:center">$0\%$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">$\ket 1_X$&lt;/td>
&lt;td style="text-align:center">$\ket 1_Y$&lt;/td>
&lt;td style="text-align:center">$1/\sqrt{2}$&lt;/td>
&lt;td style="text-align:center">$50\%$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Here, we have an example of a &lt;strong>non-factorizable state&lt;/strong>; qubits $X$ and $Y$ are correlated to each other dependently. If we know $X$ is in the $\ket 0_X$ state, then we &lt;em>necessarily&lt;/em> know that $Y$ is in the $\ket 0_Y$ state. Such a correlation means it&amp;rsquo;s not possible to express the probabilities independently. It might be tempting to think that one can simply think of $X$ having a $50\%$ probability of being in either basis state, and $Y$ having a $50\%$ probability of being in either state&amp;mdash;facts which are certainly true&amp;mdash;but considering those independently would give us a &lt;em>different&lt;/em> distribution of probabilities of the system:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">Qubit $X$&lt;/th>
&lt;th style="text-align:center">Qubit $Y$&lt;/th>
&lt;th style="text-align:center">Probability&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">$\ket 0_X$&lt;/td>
&lt;td style="text-align:center">$\ket 0_Y$&lt;/td>
&lt;td style="text-align:center">$P(X=\ket 0_X)P(Y=\ket 0_Y)=25\%$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">$\ket 0_X$&lt;/td>
&lt;td style="text-align:center">$\ket 1_Y$&lt;/td>
&lt;td style="text-align:center">$P(X=\ket 0_X)P(Y=\ket 1_Y)=25\%$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">$\ket 1_X$&lt;/td>
&lt;td style="text-align:center">$\ket 0_Y$&lt;/td>
&lt;td style="text-align:center">$P(X=\ket 1_X)P(Y=\ket 0_Y)=25\%$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">$\ket 1_X$&lt;/td>
&lt;td style="text-align:center">$\ket 1_Y$&lt;/td>
&lt;td style="text-align:center">$P(X=\ket 1_X)P(Y=\ket 1_Y)=25\%$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>This state is called &lt;strong>factorizable&lt;/strong> because we can express each
probability as a product of probabilities pertaining to the original
qubits, i.e., each probability has a form that looks like
$P(X)P(Y)$. Note that here, knowing something about $X$ gives us &lt;em>no&lt;/em>
information about $Y$, since they&amp;rsquo;re completely independent. With that
said, it should be emphasized that factorizable states &lt;em>are&lt;/em> perfectly
valid states, but they don&amp;rsquo;t represent the entirety of possible
states.&lt;/p>
&lt;p>If qubits $X$ and $Y$ live in the linear spaces $B_X$ and $B_Y$ respectively, then the composite space is written $B_X\otimes B_Y$. This is called a &lt;strong>tensor product&lt;/strong>, which is a way to combine &lt;em>spaces&lt;/em> with the above structure. Formally, if we have an $m$-dimensional vector spaces $V:=\operatorname{span}\{v_1,\ldots,v_m\}$ and an $n$-dimensional vector space $W:=\operatorname{span}\{w_1,\ldots,w_n\}$, then their tensor product $T:=V\otimes W$ will be an $mn$-dimensional vector space $\operatorname{span}\{t_1,\ldots,t_{mn}\}$, where each $t_i$ is a formal combination of basis vectors from $V$ and $W$. (There are of course $mn$ different combinations of $v$&amp;rsquo;s and $w$&amp;rsquo;s.) To give an example without all the abstraction, consider $V$ with a basis $\{\vec x, \vec y, \vec z\}$ and $W$ with a basis $\{\vec p, \vec q\}$. Then $V\otimes W$ will have a basis&lt;/p>
&lt;p>$$
\left\{
\begin{array}{lll}
\vec x\otimes\vec p, &amp;amp; \vec y\otimes\vec p, &amp;amp; \vec z\otimes\vec p, \\
\vec x\otimes\vec q, &amp;amp; \vec y\otimes\vec q, &amp;amp; \vec z\otimes\vec q\hphantom{,}
\end{array}
\right\}.
$$&lt;/p>
&lt;p>An example vector in the space $V\otimes W$ might be&lt;/p>
&lt;p>$$
-i(\vec x\otimes\vec p) - 2(\vec y\otimes\vec p) + 3 (\vec z\otimes\vec p) +
\frac{1}{4}(\vec x\otimes\vec q) - \sqrt{5}(\vec y\otimes\vec q) + e^{6\pi}(\vec z\otimes\vec q),
$$&lt;/p>
&lt;p>assuming these vector spaces are over $\mathbb{C}$.&lt;/p>
&lt;p>Intuitively, a tensor product &amp;ldquo;just&amp;rdquo; gives us a way to associate a number with each possible combination of basis vector. In our case, we need to associate a probability amplitude with each combination of distinguished qubit basis states. We need this ability since&amp;mdash;as we&amp;rsquo;ve established&amp;mdash;we need to consider every possible holistic outcome of a collection of qubits, as opposed to the outcomes of the qubits independently. (The former constitute both factorizable and non-factorizable states, while the latter only include factorizable states.)&lt;/p>
&lt;h3 id="bit-string-notation-and-a-general-quantum-state">Bit-String notation and a general quantum state&lt;/h3>
&lt;p>If we have qubits $X$, $Y$, and $Z$, then they&amp;rsquo;ll live in the space $B_X\otimes B_Y\otimes B_Z$, which we&amp;rsquo;ll call $Q_3$. It will be massively inconvenient to write the basis vectors as, for example, $\ket 0_X\otimes \ket 1_Y\otimes\ket 1_Z$, so we instead use the shorthand $\ket{011}$ when the space has been defined. This is called &lt;strong>bit-string notation&lt;/strong>. A general element $\ket\psi$ of $Q_3$ can be written $$\psi_0\ket{000}+\psi_1\ket{001}+\psi_2\ket{010}+\psi_3\ket{011}+\psi_4\ket{100}+\psi_5\ket{101}+\psi_6\ket{110}+\psi_7\ket{111}.$$ There are two substantial benefits from using bit-string notation. These benefits are much more thoroughly explained in &lt;a href="https://arxiv.org/abs/1711.02086">this paper&lt;/a>&amp;mdash;which was a precursor to this very blog post.&lt;/p>
&lt;p>The first benefit is that the names of the qubits&amp;mdash;$X$, $Y$, and $Z$&amp;mdash;have been abstracted away. They&amp;rsquo;re now just positions in a bit-string, and we can canonically name the qubits according to their position. We record positions &lt;em>from the right starting from zero&lt;/em>, so $X$ is in position $2$, $Y$ is in position $1$, and $Z$ is in position $0$.&lt;/p>
&lt;p>The second benefit is one relevant to how we implement quantum states on a computer. As written, the probability amplitude $\psi_i$ has an index $i$ whose binary expansion matches the bit-string of the basis vector whose scalar component is $\psi_i$. This is no accident. The main outcome of this is that we can use a non-negative integer as a way of specifying a bit-string, which also acts as an index into an array of probability amplitudes. So for instance, the above state can be written further compactly as $$\ket\psi=\sum_{i=0}^7\psi_i\ket i.$$ Here, $\ket i$ refers to the $i$th bit-string in lexicographic (&amp;ldquo;dictionary&amp;rdquo;) order, or equivalently, the binary expansion of $i$ as a bit-string.&lt;/p>
&lt;p>Since qubits live in a two-dimensional space, then $n$ qubits will live in a $2^n$-dimensional space. With a great deal of work, we&amp;rsquo;ve come to our most general&lt;sup id="fnref:6">&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref">6&lt;/a>&lt;/sup> representation of an $n$-qubit system: $$\sum_{i=0}^{2^n-1}\psi_i\ket i,$$ where $\vert\psi_i\vert^2$ gives us the probability of observing the bit-string $\ket i$, implying $$\sum_{i=0}^{2^n-1}\vert\psi_i\vert^2=1.$$&lt;/p>
&lt;p>On a computer, representing a quantum state for an $n$-qubit system is simple: It&amp;rsquo;s just an array of $2^n$ complex numbers. An index $i$ into the array represents the probability amplitude $\psi_i$, which is the scalar component of $\ket{i}$. So, for instance, the state $\ket{000}$ in a 3-qubit system is represented by an array whose first element is $1$ and the rest $0$. Here is a function to allocate a new quantum state of $n$ qubits, initialized to be in the $\ket{\ldots 000}$ state:&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun make-quantum-state (n)
(let ((s (make-array (expt 2 n) :initial-element 0.0d0)))
(setf (aref s 0) 1.0d0)
s))
&lt;/code>&lt;/pre>&lt;p>Sometimes, given a quantum state, or even an operator on a quantum
state, we will want to recover how many qubits the state represents,
or the operator acts on. In both cases, the question reduces to
determining the number of qubits that a dimension represents. Since
our dimensions are always powers of two, we need to compute the
equivalent of a binary logarithm. In Common Lisp, we can compute this
by computing the number of bits an integer takes to represent using
&lt;code>integer-length&lt;/code>. The number $2^n$ is always a &lt;code>1&lt;/code> followed by $n$
&lt;code>0&lt;/code>&amp;rsquo;s, so the length of $2^n$ in binary is $n+1$.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun dimension-qubits (d)
(1- (integer-length d)))
&lt;/code>&lt;/pre>&lt;h3 id="evolving-the-quantum-state">Evolving the quantum state&lt;/h3>
&lt;p>Since the quantum state is a vector, the principal way we change it is
through linear operators represented as matrices. As our quantum
program executes, we say that the quantum state
&lt;em>evolves&lt;/em>. Matrix&amp;ndash;vector multiplication is accomplished with
&lt;code>apply-operator&lt;/code> and matrix&amp;ndash;matrix multiplication is accomplished
with &lt;code>compose-operators&lt;/code>. There is nothing special about these
functions; they are the standard textbook algorithms.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun apply-operator (matrix column)
(let* ((matrix-size (array-dimension matrix 0))
(result (make-array matrix-size :initial-element 0.0d0)))
(dotimes (i matrix-size)
(let ((element 0))
(dotimes (j matrix-size)
(incf element (* (aref matrix i j) (aref column j))))
(setf (aref result i) element)))
(replace column result)))
(defun compose-operators (A B)
(destructuring-bind (m n) (array-dimensions A)
(let* ((l (array-dimension B 1))
(result (make-array (list m l) :initial-element 0)))
(dotimes (i m result)
(dotimes (k l)
(dotimes (j n)
(incf (aref result i k)
(* (aref A i j)
(aref B j k)))))))))
&lt;/code>&lt;/pre>&lt;p>These functions will sit at the heart of the interpreter, which will
be elaborated upon in &lt;a href="#gates">the section about gates&lt;/a>.&lt;/p>
&lt;h2 id="measurement">Measurement&lt;/h2>
&lt;p>Already, through the construction of our quantum state, we&amp;rsquo;ve
discussed the idea that the probability amplitudes imply a probability
of observing a state. Measurement then amounts to looking at a quantum
state as a discrete probability distribution and sampling from it.&lt;/p>
&lt;p>Measurement in quantum mechanics is side-effectful; observation of a
quantum state also simultaneously &lt;em>collapses&lt;/em> that state. This means
that when we measure a state to be a bit-string, then the state will
also &lt;em>become&lt;/em> that bit-string, zeroing out every other component in the
process.&lt;/p>
&lt;p>We thus implement the process of measurement in two steps: The
sampling of the state followed by its collapse.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun observe (machine)
(let ((b (sample (machine-quantum-state machine))))
(collapse (machine-quantum-state machine) b)
(setf (machine-measurement-register machine) b)
machine))
&lt;/code>&lt;/pre>&lt;p>Note that we&amp;rsquo;ve recorded our observation into the measurement register. We now proceed to define what we mean by &lt;code>sample&lt;/code> and &lt;code>collapse&lt;/code>.&lt;/p>
&lt;p>How shall we sample? This is a classic problem in computer science. If we have $N$ events $\{0, 1,\ldots,N-1\}$, such that event $e$ has probability $P(e)$, then we can sample as follows. Consider the partial sums defined by the recurrence $S(0)=0$ and $S(k)=S(k-1) + P(k-1)$. If we draw a random number $r$ uniformly from $[0,1)$, then we wish to find the $k$ such that $S(k)\leq r &amp;lt; S(k+1)$. Such a $k$ will be a sampling of our events according to the imposed probability distribution.&lt;/p>
&lt;p>We can implement this simply by computing successive partial sums, until our condition is satisfied. In fact, we can be a little bit more resourceful. We can find when $r-S(k+1)&amp;lt;0$, which amounts to successive updates $r\leftarrow r-P(k)$.&lt;/p>
&lt;p>With a quantum system, we have $P(\ket i) = \vert\psi_i\vert^2$, and the sampled $k$ is the bit-string $\ket k$ we find.&lt;/p>
&lt;p>Let&amp;rsquo;s do an example. Suppose we have a quantum state&lt;/p>
&lt;p>$$
\sqrt{0.2}\ket{00} - \sqrt{0.07}\ket{01} + \sqrt{0.6}\ket{10} + \sqrt{0.13}\ket{11}.
$$&lt;/p>
&lt;p>Then our discrete probability distribution is:&lt;/p>
&lt;p>$$
P(\ket{00}) = 0.2\qquad P(\ket{01}) = 0.07\qquad P(\ket{10}) = 0.6\qquad P(\ket{11}) = 0.13
$$&lt;/p>
&lt;p>Next, suppose we draw a random number $r = 0.2436$. We first check if $r &amp;lt; 0.2$. It&amp;rsquo;s not, so $\ket{00}$ is not our sample. Subtract it from $r$ to get $r = 0.0436$. Next check if $r &amp;lt; 0.07$. Yes, so our sample is $\ket{01}$. Pictorially, this looks like the following:&lt;/p>
&lt;div style="text-align: center;">
&lt;img
src="images/sample.svg"
alt="A process of selecting a random sample."
decoding="async"
/>
&lt;/div>
&lt;p>The implementation is straightforward:&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun sample (state)
(let ((r (random 1.0d0)))
(dotimes (i (length state))
(decf r (expt (abs (aref state i)) 2))
(when (minusp r) (return i)))))
&lt;/code>&lt;/pre>&lt;p>Collapsing to $\ket k$ is simply zeroing out the array and setting $\psi_k$ to $1$.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun collapse (state basis-element)
(fill state 0.0d0)
(setf (aref state basis-element) 1.0d0))
&lt;/code>&lt;/pre>&lt;h2 id="gates">Gates&lt;/h2>
&lt;h3 id="gates-as-matrices">Gates as matrices&lt;/h3>
&lt;p>Gates are the meat of most quantum algorithms. They represent the
&amp;ldquo;hard work&amp;rdquo; a quantum computer does. As previously described, a gate
$g$ is a transformation that is linear, invertible, and
length-preserving.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Linear&lt;/strong>: $g(a\ket\psi+b\ket\phi)=ag(\ket\psi)+bg(\ket\phi)$.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Invertible&lt;/strong>: There is always an operation $h$ that can cancel out the effect of $g$: $h(g(\ket\psi))=g(h(\ket\psi))=\ket\psi$.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Length-Preserving&lt;/strong>: $\Vert g(\ket\psi)\Vert = \Vert\ket\psi\Vert$.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>These ideas are captured by an overarching idea called a &lt;strong>linear isometry&lt;/strong>, which comes from the Greek word &lt;em>isometria&lt;/em>, with &lt;em>isos&lt;/em> meaning &amp;ldquo;equal&amp;rdquo; and &lt;em>metria&lt;/em> meaning &amp;ldquo;measuring&amp;rdquo;. As with all linear transformations, we can write them out as a matrix with respect to a particular basis. Matrices representing linear isometries are called &lt;strong>unitary matrices&lt;/strong>&lt;sup id="fnref:7">&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref">7&lt;/a>&lt;/sup>.&lt;/p>
&lt;p>The simplest gate must be identity, a gate which does nothing.&lt;/p>
&lt;p>$$
\mathsf{I} := \begin{pmatrix}
1 &amp;amp; 0\\
0 &amp;amp; 1
\end{pmatrix}
$$&lt;/p>
&lt;p>In Common Lisp, this would be defined as&lt;/p>
&lt;pre tabindex="0">&lt;code>(defparameter +I+ #2A((1 0)
(0 1)))
&lt;/code>&lt;/pre>&lt;p>which we will make use of later. Just a notch higher in complexity
would be the quantum analog of a Boolean &amp;ldquo;NOT&amp;rdquo;. This is called the
$\mathsf{X}$ gate:&lt;/p>
&lt;p>$$
\mathsf{X} := \begin{pmatrix}
0 &amp;amp; 1\\
1 &amp;amp; 0
\end{pmatrix}.
$$&lt;/p>
&lt;p>This has the effect of mapping $\mathsf{X}\ket 0=\ket 1$, which means
directly that $\mathsf{X}\ket 1=\ket 0$ and therefore it is its own
inverse: $\mathsf{X}\mathsf{X} = \mathsf{I}$ so $\mathsf{X}=\mathsf{X}^{-1}$.&lt;/p>
&lt;p>We suggest re-reviewing how one interprets a matrix as an explicit
mapping of each element of the basis, as it helps make sense of
gates. In this tutorial, gate matrices are always specified in terms
of the bit-string basis&lt;/p>
&lt;p>$$
\{\ket{\ldots000}, \ket{\ldots001}, \ket{\ldots010}, \ket{\ldots011}, \ldots\}.
$$&lt;/p>
&lt;p>We again refer the reader to &lt;a href="https://arxiv.org/abs/1711.02086">this
paper&lt;/a> for an in-depth discussion
about this basis.&lt;/p>
&lt;p>In the rest of this section, the whole goal is to be able to apply
gates to our quantum state. There are two cases of pedagogical and
operational interest: the one-qubit gate and the many-qubit gate. We
will write two functions to accomplish each of these, in order to
implement a general function called &lt;code>apply-gate&lt;/code> for applying any kind
of gate on any collection of qubits for any quantum state.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun apply-gate (state U qubits)
(assert (= (length qubits) (dimension-qubits (array-dimension U 0))))
(if (= 1 (length qubits))
(%apply-1Q-gate state U (first qubits))
(%apply-nQ-gate state U qubits)))
&lt;/code>&lt;/pre>&lt;h3 id="gates-on-multi-qubit-machines">Gates on multi-qubit machines&lt;/h3>
&lt;p>If we are working with the machine $M_n$, then our space is $2^n$-dimensional, and as such, our matrices would be written out as $2^n\times 2^n$ arrays of numbers. If we can write out such a matrix, then applying it is as simple as a matrix&amp;ndash;vector multiplication. For instance, for a $4$-qubit machine, an $\mathsf{X}$ on qubit $0$ would be written&lt;/p>
&lt;p>$$
\begin{pmatrix}
0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0
\end{pmatrix},
$$&lt;/p>
&lt;p>which could be readily applied to a $16$-element quantum state vector. It is easy to verify that this will swap the components of $\ket{\ldots 0}$ with the corresponding components of $\ket{\ldots 1}$.&lt;/p>
&lt;p>But as should be plainly obvious from the obnoxious amount of paper wasted by writing out this matrix, it would be better if we could simply generate this matrix with just three pieces of information: the gate matrix $g=\left(\begin{smallmatrix}0 &amp;amp; 1\\1 &amp;amp; 0\end{smallmatrix}\right)$, the qubit index $i=0$, and the size of the machine $n=4$. This is a process we will call &lt;strong>lifting&lt;/strong>.&lt;/p>
&lt;p>Lifting requires a fundamental tool for constructing operators on spaces that were formed out of tensor products. If we have two finite-dimensional vector spaces $U$ and $V$, and operators $f$ and $g$ on the spaces respectively, then it seems reasonable to consider how $f$ and $g$ transform $U\otimes V$. In some sense, applying $f$ and $g$ &amp;ldquo;in parallel&amp;rdquo; on $U\otimes V$ correspond to a new linear operator $h$. If $f$ and $g$ are matrices, then $h$ is defined by a &lt;em>block matrix&lt;/em>&lt;/p>
&lt;p>$$
\begin{equation}
h_{i,j} = f_{i,j} g.
\label{eq:kron}
\end{equation}
$$&lt;/p>
&lt;p>More specifically, let $0 \leq i,j &amp;lt; \dim U$. The matrix $h$ will be
an array of $\dim U \times \dim U$ copies of $g$, where the entries of
the $(i,j)$th blocks are multiplied by the single
scalar $f_{i,j}$. This will lead to a matrix with $(\dim U)(\dim V)$
rows and columns, which is exactly the dimension of $U\otimes
V$. Incidentally, we write $h$ as $f\otimes g$, and this combination
of operators is called the &lt;strong>Kronecker product&lt;/strong>&lt;sup id="fnref:8">&lt;a href="#fn:8" class="footnote-ref" role="doc-noteref">8&lt;/a>&lt;/sup>. As code:&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun kronecker-multiply (A B)
(destructuring-bind (m n) (array-dimensions A)
(destructuring-bind (p q) (array-dimensions B)
(let ((result (make-array (list (* m p) (* n q)))))
(dotimes (i m result)
(dotimes (j n)
(let ((Aij (aref A i j))
(y (* i p))
(x (* j q)))
(dotimes (u p)
(dotimes (v q)
(setf (aref result (+ y u) (+ x v))
(* Aij (aref B u v))))))))))))
&lt;/code>&lt;/pre>&lt;p>&lt;em>As a matter of terminology, remember that tensor products combine
vector spaces, and Kronecker products combine operator matrices.&lt;/em>&lt;/p>
&lt;h3 id="single-qubit-gates-and-gates-on-adjacent-qubits">Single-qubit gates and gates on adjacent qubits&lt;/h3>
&lt;p>From here, we can very easily lift one-qubit gates to machines with
any number of qubits. A gate $g$ on qubit $i$ in an $n$-qubit machine
is just $g$ applied to qubit $i$ and the identity $\mathsf{I}$ on all
other qubits. Writing this out as a Kronecker product, we have&lt;/p>
&lt;p>$$
\begin{equation}
\operatorname{lift}(g, i, n) :=
\underbrace{\mathsf{I} \otimes \mathsf{I} \otimes \cdots}_{n-i-1\text{ factors}}
\otimes g \otimes
\underbrace{\cdots \otimes \mathsf{I}}_{i\text{ factors}},
\label{eq:liftone}
\end{equation}
$$&lt;/p>
&lt;p>where there are a total of $n$ factors, and $g$ is at positioned $i$ factors from the right.&lt;/p>
&lt;p>This concept generalizes to higher-dimensional operators which act on &lt;em>index-adjacent qubits&lt;/em>. In other words, if $g$ is a $k$-qubit operator &lt;em>specifically&lt;/em> acting on qubits&lt;/p>
&lt;p>$$
(i+k-1, i+k-2, \ldots, i+2, i+1, i),
$$&lt;/p>
&lt;p>then the lifting operator from \eqref{eq:liftone} is much the same:&lt;/p>
&lt;p>$$
\begin{equation}
\operatorname{lift}(g, i, n) := \underbrace{\mathsf{I} \otimes \mathsf{I} \otimes \cdots}_{n-i-k\text{ factors}}
\otimes g \otimes
\underbrace{\cdots \otimes \mathsf{I}}_{i\text{ factors}}.
\label{eq:liftmany}
\end{equation}
$$&lt;/p>
&lt;p>It must be emphasized one last time: &lt;em>This only works for multi-qubit operators that act on qubits that are index-adjacent.&lt;/em> We will get to how to work with non-adjacent qubits shortly, but first we will turn this into code.&lt;/p>
&lt;p>For simplicity, we create a way to iterate a Kronecker product
multiple times, that is, compute&lt;/p>
&lt;p>$$
\underbrace{g\otimes \cdots \otimes g}_{n\text{ factors}},
$$&lt;/p>
&lt;p>which is usually simply written $g^{\otimes n}$. We must use care when
handling the case when we are &amp;ldquo;Kronecker exponentiating&amp;rdquo; by a
non-positive number, so that $f\otimes g^{\otimes 0} = f$.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun kronecker-expt (U n)
(cond
((&amp;lt; n 1) #2A((1)))
((= n 1) U)
(t (kronecker-multiply (kronecker-expt U (1- n)) U))))
&lt;/code>&lt;/pre>&lt;p>With &lt;code>kroncker-expt&lt;/code>, we can write &lt;code>lift&lt;/code> following \eqref{eq:liftmany}:&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun lift (U i n)
(let ((left (kronecker-expt +I+ (- n i (dimension-qubits
(array-dimension U 0)))))
(right (kronecker-expt +I+ i)))
(kronecker-multiply left (kronecker-multiply U right))))
&lt;/code>&lt;/pre>&lt;h3 id="multi-qubit-gates-on-non-adjacent-qubits">Multi-qubit gates on non-adjacent qubits&lt;/h3>
&lt;p>In this section, we assume we are working on a multi-qubit machine
$M_n$ with $n\ge 2$.&lt;/p>
&lt;h4 id="the-general-idea">The general idea&lt;/h4>
&lt;p>So far, we&amp;rsquo;ve managed to get away with lifting operators that act on
either a single qubit, or a collection of index-adjacent qubits. This
has been more-or-less trivial, because we can tack on a series of
identity operators by way of Kronecker products to simulate &amp;ldquo;doing
nothing&amp;rdquo; to the other qubits. However, if we want to apply a
multi-qubit gate to a collection of qubits that aren&amp;rsquo;t index-adjacent,
we have to be a little more clever.&lt;/p>
&lt;p>The way we accomplish this is by swapping qubits around so that we can
move in and out of index-adjacency. In fact, for a given gate acting
on a given collection of qubits, we aim to compute an operator $\Pi$
which moves these qubits into index-adjacency, so that we can compute&lt;/p>
&lt;p>$$
\begin{equation}
\Pi^{-1} \operatorname{lift}(g, 0, n) \Pi.
\label{eq:upq}
\end{equation}
$$&lt;/p>
&lt;p>This recipe requires many ingredients, each of which we describe in
detail.&lt;/p>
&lt;h4 id="swapping-two-qubits">Swapping two qubits&lt;/h4>
&lt;p>To start, we need some way to swap the state of two qubits. We can do
this with the $\mathsf{SWAP}$ operator:&lt;/p>
&lt;p>$$
\mathsf{SWAP} := \begin{pmatrix}
1 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 1 &amp;amp; 0\\
0 &amp;amp; 1 &amp;amp; 0 &amp;amp; 0\\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 1
\end{pmatrix}.
$$&lt;/p>
&lt;p>In Common Lisp, we define this in the same way we defined &lt;code>+I+&lt;/code>.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defparameter +SWAP+ #2A((1 0 0 0)
(0 0 1 0)
(0 1 0 0)
(0 0 0 1)))
&lt;/code>&lt;/pre>&lt;p>The $\mathsf{SWAP}$ operator takes two qubits and swaps their
state. What does this mean in a system of correlations, where qubit
state isn&amp;rsquo;t strictly compartmentalized (i.e., factorized)? Swapping is
equivalent to swapping the component of $\ket{01}$ with the component
of $\ket{10}$, which are the only two distinguishable
correlations&lt;sup id="fnref:9">&lt;a href="#fn:9" class="footnote-ref" role="doc-noteref">9&lt;/a>&lt;/sup>. Still, in a multi-qubit system, we can&amp;rsquo;t
immediately arbitrarily swap two qubits with the tools we&amp;rsquo;ve
developed. What we can do is swap index-adjacent qubits. In
particular, we can define the transpositions&lt;/p>
&lt;p>$$
\tau_i := \operatorname{lift}(\mathsf{SWAP}, i, n),\qquad \text{with }0\leq i &amp;lt; n - 1.
$$&lt;/p>
&lt;p>The transposition $\tau_i$ swaps qubit $i$ with qubit $i+1$. This is
our first ingredient.&lt;/p>
&lt;h4 id="re-arranging-qubits-to-be-index-adjacent">Re-arranging qubits to be index-adjacent&lt;/h4>
&lt;p>The second ingredient is a way to re-arrange our qubits so that they
are index-adjacent. Suppose we have a three-qubit operator $g$ which
acts on qubits $(2, 4, 3)$ in a machine of $n=5$ qubits. The space in
which the quantum state of $M_5$ lives is&lt;/p>
&lt;p>$$
B_4 \otimes B_3 \otimes B_2 \otimes B_1 \otimes B_0,
$$&lt;/p>
&lt;p>but we need to re-arrange our state vector as if we&amp;rsquo;ve moved $B_2\to
B_0$, $B_4\to B_1$, and $B_3\to B_2$ so that our sub-state sits
index-adjacent. In combinatorics, this permutation is written in
two-line notation&lt;/p>
&lt;p>$$
\begin{pmatrix}
0 &amp;amp; 1 &amp;amp; 2 &amp;amp; 3 &amp;amp; 4\\
3 &amp;amp; 4 &amp;amp; 0 &amp;amp; 2 &amp;amp; 1
\end{pmatrix}.
$$&lt;/p>
&lt;p>Here, we&amp;rsquo;ve made a few arbitrary decisions. First, we&amp;rsquo;ve decided to
re-map a $k$-qubit operator to the $B_{k-1}\otimes\cdots\otimes
B_1\otimes B_0$ subspace. Any other index-adjacent subspace would
work, but this simplifies the code. Second, we see that $0\mapsto 3$
and $1\mapsto 4$, but it doesn&amp;rsquo;t matter so much where they map to, as
long as $2$, $4$, and $3$ are mapped correctly.&lt;/p>
&lt;p>There&amp;rsquo;s no sense in writing the first line in two-line notation, so we
just write the permutation compactly as $34021$. As a quantum
operator, we write this as $\Pi_{34021}$.&lt;/p>
&lt;p>The question is: How can we write $\Pi_{34021}$ as familiar operators?
It is a well-known fact in combinatorics that any permutation can be
decomposed into a composition of swaps, and every swap can be
decomposed into a series of adjacent transpositions. We leave this as
an exercise&lt;sup id="fnref:10">&lt;a href="#fn:10" class="footnote-ref" role="doc-noteref">10&lt;/a>&lt;/sup>, but we will show the code to our implementation.&lt;/p>
&lt;p>We start with a function which takes a permutation written as a list,
like &lt;code>(3 4 0 2 1)&lt;/code>, and converts it to a list of (possibly
non-adjacent) transpositions to be applied left-to-right, represented
as cons cells &lt;code>((0 . 3) (1 . 4) (2 . 3))&lt;/code>.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun permutation-to-transpositions (permutation)
(let ((swaps nil))
(dotimes (dest (length permutation) (nreverse swaps))
(let ((src (elt permutation dest)))
(loop :while (&amp;lt; src dest) :do
(setf src (elt permutation src)))
(cond
((&amp;lt; src dest) (push (cons src dest) swaps))
((&amp;gt; src dest) (push (cons dest src) swaps)))))))
&lt;/code>&lt;/pre>&lt;p>Next, we convert these transpositions as cons cells to adjacent
transposition indexes. This is straightforward. If we are swapping
$(a,b)$ with $a&amp;lt;b$, then we transpose $(a, a+1)$, then $(a+1, a+2)$,
and so on until $(b-1, b)$, followed by a reversal of each except
$(b-1, b)$. We can simply write this chain of adjacent transpositions
as $(a, a+1, \ldots, b-1, \ldots, a+1, a)$. In this example, we&amp;rsquo;d have
the transposition indexes &lt;code>(0 1 2 1 0 1 2 3 2 1 2)&lt;/code>.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun transpositions-to-adjacent-transpositions (transpositions)
(flet ((expand-cons (c)
(if (= 1 (- (cdr c) (car c)))
(list (car c))
(let ((trans (loop :for i :from (car c) :below (cdr c)
:collect i)))
(append trans (reverse (butlast trans)))))))
(mapcan #'expand-cons transpositions)))
&lt;/code>&lt;/pre>&lt;p>These are indexes $i_1, i_2, \ldots$ such that $\Pi = \cdots
\tau_{i_2}\tau_{i_1}$&lt;/p>
&lt;p>The last ingredient we need is inverting $\Pi$. If we have $\Pi$
represented as a sequence of $\tau$, then we simply reverse the list
of $\tau$.&lt;/p>
&lt;h4 id="using-transpositions-to-implement-multi-qubit-gates">Using transpositions to implement multi-qubit gates&lt;/h4>
&lt;p>With all of these, we write what is perhaps the most important
function of our interpreter.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun %apply-nQ-gate (state U qubits)
(let ((n (dimension-qubits (length state))))
(labels ((swap (i)
(lift +swap+ i n))
(transpositions-to-operator (trans)
(reduce #'compose-operators trans :key #'swap)))
(let* ((U01 (lift U 0 n))
(from-space (append (reverse qubits)
(loop :for i :below n
:when (not (member i qubits))
:collect i)))
(trans (transpositions-to-adjacent-transpositions
(permutation-to-transpositions
from-space)))
(to-&amp;gt;from (transpositions-to-operator trans))
(from-&amp;gt;to (transpositions-to-operator (reverse trans)))
(Upq (compose-operators to-&amp;gt;from
(compose-operators U01
from-&amp;gt;to))))
(apply-operator Upq state)))))
&lt;/code>&lt;/pre>&lt;p>A few quick notes for comprehension:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The value of &lt;code>(swap i)&lt;/code> is $\tau_i$ fully lifted.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The one-line zinger that defines &lt;code>transpositions-to-operator&lt;/code> takes
a list of transposition indexes and converts it into a unitary
operator. It does so by doing what&amp;rsquo;s known in functional programming
as a &lt;em>map-reduce&lt;/em>, by first mapping $i\mapsto\tau_i$ and reducing by
operator composition.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The variable &lt;code>from-space&lt;/code> contains the permutation $p$ that encodes
the space in which we&amp;rsquo;d like to act. This permutation is calculated
based off of the &lt;code>qubits&lt;/code> argument.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The variables &lt;code>from-&amp;gt;to&lt;/code> and &lt;code>to-&amp;gt;from&lt;/code> represent $\Pi_p$ and
$\Pi^{-1}_p$ respectively.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The variable &lt;code>Upq&lt;/code> is our fully lifted operator, exactly by way of
\eqref{eq:upq}.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>The function &lt;code>%apply-nQ-gate&lt;/code> is what allows our interpreter to be so
general. Making the interpreter more efficient ultimately is an
exercise in making this function more efficient.&lt;/p>
&lt;p>The only thing left to do is integrate all of the topics discussed
hitherto into an interpreter!&lt;/p>
&lt;h2 id="an-interpreter">An interpreter&lt;/h2>
&lt;h3 id="the-driver-loop">The driver loop&lt;/h3>
&lt;p>The bulk of the interpreter has been written. We&amp;rsquo;ve described the
semantics of the two instructions of interest: &lt;code>MEASURE&lt;/code> and
&lt;code>GATE&lt;/code>. Now we create the interpreter itself, which is just a driver
loop to read and execute these instructions, causing state transitions
of our abstract machine. If we see a &lt;code>GATE&lt;/code>, we call &lt;code>apply-gate&lt;/code>. If
we see a &lt;code>MEASURE&lt;/code>, we call &lt;code>observe&lt;/code>.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun run-quantum-program (qprog machine)
(loop :for (instruction . payload) :in qprog
:do (ecase instruction
((GATE)
(destructuring-bind (gate &amp;amp;rest qubits) payload
(apply-gate (machine-quantum-state machine) gate qubits)))
((MEASURE)
(observe machine)))
:finally (return machine)))
&lt;/code>&lt;/pre>&lt;h3 id="efficiency">Efficiency&lt;/h3>
&lt;p>Performance-focused individuals will have noticed that this
interpreter is pretty costly, in many ways. The biggest cost is also
unavoidable: The fact that our state grows exponentially with the
number of qubits. Real, physical quantum computers avoid this cost,
which makes them alluring machines to both study and construct.&lt;/p>
&lt;p>However, even with this unavoidable cost, this interpreter has been
implemented for ease of understanding and not machine
efficiency. Writing a faster interpreter amounts to avoiding the
construction of the lifted operator matrices. This can be done with
very careful index wrangling and sensitivity to data types and
allocation. This is how the high-performance &lt;a href="https://github.com/quil-lang/qvm">Quantum Virtual
Machine&lt;/a> is implemented.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>What good is writing an interpreter if we don&amp;rsquo;t write any programs
worth interpreting? Here are a few examples of programs.&lt;/p>
&lt;h3 id="bell-state">Bell state&lt;/h3>
&lt;p>The &lt;strong>Bell state&lt;/strong> is one which we&amp;rsquo;ve explored earlier. It is a
two-qubit state $$\frac{1}{\sqrt{2}}(\ket {00} + \ket {11}).$$ Here&amp;rsquo;s
a program to generate one, using two new gates, the &lt;strong>controlled-not
gate&lt;/strong> $\mathsf{CNOT}$ and the &lt;strong>Hadamard gate&lt;/strong> $\mathsf{H}$.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defparameter +H+ (make-array '(2 2) :initial-contents (let ((s (/ (sqrt 2))))
(list (list s s)
(list s (- s))))))
(defparameter +CNOT+ #2A((1 0 0 0)
(0 1 0 0)
(0 0 0 1)
(0 0 1 0))))
(defun bell (p q)
`((GATE ,+H+ ,p)
(GATE ,+CNOT+ ,p ,q)))
&lt;/code>&lt;/pre>&lt;h3 id="greenberger--horne--zeilinger-state">Greenberger&amp;ndash;Horne&amp;ndash;Zeilinger state&lt;/h3>
&lt;p>The &lt;strong>Greenberger&amp;ndash;Horne&amp;ndash;Zeilinger state&lt;/strong>, or &lt;strong>GHZ state&lt;/strong>, is a
generalization of the Bell state on more than two qubits, namely
$$\frac{1}{\sqrt{2}}(\ket{0\ldots 000} + \ket{1\ldots 111}).$$ This is
accomplished by executing a chain of controlled-not gates:&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun ghz (n)
(cons `(GATE ,+H+ 0)
(loop :for q :below (1- n)
:collect `(GATE ,+CNOT+ ,q ,(1+ q)))))
&lt;/code>&lt;/pre>&lt;h3 id="the-quantum-fourier-transform">The quantum Fourier transform&lt;/h3>
&lt;p>The ordinary discrete Fourier transform of a complex vector is a
unitary operator, and as such, it can be encoded as a quantum
program. We will write a program which computes the Fourier transform
of the probability amplitudes of an input quantum state (a time-domain
signal), producing a new quantum state whose amplitudes represent
components in the frequency domain. This is the central subroutine to
Shor&amp;rsquo;s algorithm, which is a quantum algorithm which factors integers
faster than any known classical method.&lt;/p>
&lt;p>First, we will need a gate called the &lt;strong>controlled-phase gate&lt;/strong> $\mathsf{CPHASE}(\theta)$:&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun cphase (angle)
(make-array '(4 4) :initial-contents `((1 0 0 0)
(0 1 0 0)
(0 0 1 0)
(0 0 0 ,(cis angle)))))
&lt;/code>&lt;/pre>&lt;p>Now, we can generate the quantum Fourier transform recursively.&lt;/p>
&lt;pre tabindex="0">&lt;code>(defun qft (qubits)
(labels ((bit-reversal (qubits)
(let ((n (length qubits)))
(if (&amp;lt; n 2)
nil
(loop :repeat (floor n 2)
:for qs :in qubits
:for qe :in (reverse qubits)
:collect `(GATE ,+swap+ ,qs ,qe)))))
(%qft (qubits)
(destructuring-bind (q . qs) qubits
(if (null qs)
(list `(GATE ,+H+ ,q))
(let ((cR (loop :with n := (1+ (length qs))
:for i :from 1
:for qi :in qs
:for angle := (/ pi (expt 2 (- n i)))
:collect `(GATE ,(cphase angle) ,q ,qi))))
(append
(qft qs)
cR
(list `(GATE ,+H+ ,q))))))))
(append (%qft qubits) (bit-reversal qubits))))
&lt;/code>&lt;/pre>&lt;p>The program for a three-qubit quantum Fourier transform &lt;code>(qft '(0 1 2))&lt;/code> looks like this:&lt;/p>
&lt;pre tabindex="0">&lt;code>(
(GATE #2A((0.7071067811865475d0 0.7071067811865475d0) (0.7071067811865475d0 -0.7071067811865475d0)) 2)
(GATE #2A((1 0 0 0) (0 1 0 0) (0 0 1 0) (0 0 0 #C(0.0d0 1.0d0))) 1 2)
(GATE #2A((0.7071067811865475d0 0.7071067811865475d0) (0.7071067811865475d0 -0.7071067811865475d0)) 1)
(GATE #2A((1 0 0 0) (0 0 1 0) (0 1 0 0) (0 0 0 1)) 1 2)
(GATE #2A((1 0 0 0) (0 1 0 0) (0 0 1 0) (0 0 0 #C(0.7071067811865476d0 0.7071067811865475d0))) 0 1)
(GATE #2A((1 0 0 0) (0 1 0 0) (0 0 1 0) (0 0 0 #C(0.0d0 1.0d0))) 0 2)
(GATE #2A((0.7071067811865475d0 0.7071067811865475d0) (0.7071067811865475d0 -0.7071067811865475d0)) 0)
(GATE #2A((1 0 0 0) (0 0 1 0) (0 1 0 0) (0 0 0 1)) 0 2)
)
&lt;/code>&lt;/pre>&lt;p>(Recall that &lt;code>#C(0 1)&lt;/code> represents the complex number $i$.)&lt;/p>
&lt;p>We can see the quantum Fourier transform in action by computing the
Fourier transform of $\ket{000}$. Here is a transcript of this
calculation:&lt;/p>
&lt;pre tabindex="0">&lt;code>CL-USER&amp;gt; (run-quantum-program
(qft '(0 1 2))
(make-machine :quantum-state (make-quantum-state 3)
:measurement-register 0))
#S(MACHINE
:QUANTUM-STATE #(#C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0))
:MEASUREMENT-REGISTER 0)
&lt;/code>&lt;/pre>&lt;p>Indeed, one can verify that the classical Fourier transform of the
vector $[1,0,0,0,0,0,0,0]$ is a vector with eight components equal to
about $0.35355$.&lt;/p>
&lt;pre tabindex="0">&lt;code>$ python
Python 2.7.16 (default, May 23 2023, 14:13:27)
[GCC 8.3.0] on linux2
Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.
&amp;gt;&amp;gt;&amp;gt; import numpy as np
&amp;gt;&amp;gt;&amp;gt; np.fft.fft([1,0,0,0,0,0,0,0], norm=&amp;quot;ortho&amp;quot;)
array([0.35355339+0.j, 0.35355339+0.j, 0.35355339+0.j, 0.35355339+0.j,
0.35355339+0.j, 0.35355339+0.j, 0.35355339+0.j, 0.35355339+0.j])
&lt;/code>&lt;/pre>&lt;h3 id="example-transcript">Example transcript&lt;/h3>
&lt;p>Here is an example transcript downloading and using this software,
using &lt;a href="https://www.sbcl.org/">Steel Bank Common Lisp&lt;/a>.&lt;/p>
&lt;pre tabindex="0">&lt;code>$ git clone https://github.com/stylewarning/quantum-interpreter.git
Cloning into 'quantum-interpreter'...
remote: Counting objects: 10, done.
remote: Compressing objects: 100% (10/10), done.
Unpacking objects: 100% (10/10), done.
remote: Total 10 (delta 2), reused 5 (delta 0), pack-reused 0
$ cd quantum-interpreter/
$ sbcl --noinform
* (load &amp;quot;qsim.lisp&amp;quot;)
T
* (load &amp;quot;examples.lisp&amp;quot;)
T
* (run-quantum-program (bell 0 1)
(make-machine :quantum-state (make-quantum-state 2)
:measurement-register 0))
#S(MACHINE
:QUANTUM-STATE #(0.7071067690849304d0 0.0d0 0.0d0 0.7071067690849304d0)
:MEASUREMENT-REGISTER 0)
* (run-quantum-program (qft '(0 1 2))
(make-machine :quantum-state (make-quantum-state 3)
:measurement-register 0))
#S(MACHINE
:QUANTUM-STATE #(#C(0.3535533724408484d0 0.0d0) #C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0) #C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0) #C(0.3535533724408484d0 0.0d0)
#C(0.3535533724408484d0 0.0d0) #C(0.3535533724408484d0 0.0d0))
:MEASUREMENT-REGISTER 0)
* (defun flip-coin ()
(machine-measurement-register
(run-quantum-program
`((GATE ,+H+ 0) (MEASURE))
(make-machine :quantum-state (make-quantum-state 1)
:measurement-register 0))))
FLIP-COIN
* (loop :repeat 10 :collect (flip-coin))
(1 1 0 1 1 0 0 1 0 1)
* (quit)
&lt;/code>&lt;/pre>&lt;h2 id="source-code">Source code&lt;/h2>
&lt;p>The source code in this tutorial are published under the BSD 3-clause
license. The complete listing and most up-to-date source code can be
found on
&lt;a href="https://github.com/stylewarning/quantum-interpreter">GitHub&lt;/a>.&lt;/p>
&lt;h2 id="ports-in-other-languages">Ports in other languages&lt;/h2>
&lt;p>Others have written this quantum interpreter in other languages. Here&amp;rsquo;s a list
of ports that people have shared with me:&lt;/p>
&lt;ul>
&lt;li>Aistis Raulinaitis&amp;rsquo;s implementation in &lt;a href="https://github.com/sheganinans/QVM-ocaml-mini">OCaml&lt;/a>&lt;/li>
&lt;li>Graham Enos&amp;rsquo;s implementation in &lt;a href="https://github.com/genos/Workbench/tree/main/quantum-interpreter">Rust&lt;/a> with a &lt;a href="https://grahamenos.com/quantum-interpreter.html">write-up&lt;/a>&lt;/li>
&lt;li>Marco Rubin&amp;rsquo;s implementation in &lt;a href="https://gitlab.com/Rubo/qsim">Python&lt;/a>, no dependencies&lt;/li>
&lt;li>Francesco Morri&amp;rsquo;s implementation in &lt;a href="https://github.com/FrancescoMorri/quantum-interpreter">Python&lt;/a>&lt;/li>
&lt;/ul>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>A controlled one-qubit gate is a kind of two-qubit gate.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>It&amp;rsquo;s actually 124 SLOC, and it has &lt;em>not&lt;/em> been &amp;ldquo;code golfed&amp;rdquo;. If we wanted to make an ever tinier quantum interpreter, we could&amp;mdash;but brevity for its own sake is not the point.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>With only a little bit of extra work, mostly bookkeeping, we could make $n$ finite but unbounded during the execution of a program by having instead a collection of so-called &lt;strong>quantum registers&lt;/strong>. These would be realized by instead a collection of $v$&amp;rsquo;s, which are opportunistically combined with entanglement occurs.&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:4" role="doc-endnote">
&lt;p>For that matter, why complex numbers, and not just real-valued probabilities? The reason is that a complex number of unit norm can be written as $e^{i\theta}$, where $\theta$ is called the &lt;strong>phase&lt;/strong>. Phases are a wave-like property, and allow the complex probability amplitudes to &lt;em>interfere&lt;/em>. Interference is a known and understood phenomenon of quantum mechanical systems, and in fact is critical to the function of a quantum computer.&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:5" role="doc-endnote">
&lt;p>Spaces with all of these properties, including a way to calculate distances, are called &lt;strong>Hilbert spaces&lt;/strong>.&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:6" role="doc-endnote">
&lt;p>The fact of the matter is that we can actual get &lt;em>more&lt;/em> general by having classical probability distributions of these states, which leads one to so-called &amp;ldquo;density operators&amp;rdquo;. This is extremely useful when studying imperfect quantum computers which have noisy operations.&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:7" role="doc-endnote">
&lt;p>While we won&amp;rsquo;t use this fact in our interpreter, even though it would be useful for error checking, it is very easy to check if a matrix is unitary. First, we compute another matrix $h$ which is the conjugate-transpose of $g$. The &lt;strong>conjugate-transpose&lt;/strong> of a matrix is just the transpose of a matrix with each complex entry conjugated. Once we have this matrix, we check that $hg$ is an identity matrix. The matrix $g$ is unitary if and only if $hg=gh=\mathsf{I}$.&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:8" role="doc-endnote">
&lt;p>Unfortunately, the definition \eqref{eq:kron} seems somewhat arbitrary and out of nowhere. Fortunately, there is a much more &amp;ldquo;first principles&amp;rdquo; approach to understanding the tensor product and the Kronecker product, starting with how we map a &lt;em>pair&lt;/em> of vectors $v\in V$ and $w\in W$ to a vector $v\otimes w\in V\otimes W$. Such approach is much more satisfying to a mathematician, and even essential to understanding the &amp;ldquo;true nature&amp;rdquo; of the tensor product, but perhaps less so to a curious implementer.&amp;#160;&lt;a href="#fnref:8" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:9" role="doc-endnote">
&lt;p>There is no sense in moving $\ket{00}$ or $\ket{11}$ to accomplish a swap-like operation, since we identify each qubits' respective $\ket 0$ identically, and each $\ket 1$ identically.&amp;#160;&lt;a href="#fnref:9" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:10" role="doc-endnote">
&lt;p>If you&amp;rsquo;re not particularly keen to figure out the math yourself, you might consult Lemma 14.1 of &lt;a href="https://www.sfu.ca/~mdevos/notes/geom-sym/14_transpositions.pdf">these lecture notes&lt;/a>. You&amp;rsquo;re also welcome to just take my word for it!&amp;#160;&lt;a href="#fnref:10" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Can a Rubik's Cube be brute-forced?</title><link>http://www.stylewarning.com/posts/brute-force-rubiks-cube/</link><pubDate>Fri, 07 Jul 2023 00:00:00 +0000</pubDate><guid>http://www.stylewarning.com/posts/brute-force-rubiks-cube/</guid><description>&lt;p>&lt;em>By Robert Smith&lt;/em>&lt;/p>
&lt;div>
&lt;hr>
&lt;h2>Contents&lt;/h2>
&lt;nav id="TableOfContents">
&lt;ol>
&lt;li>&lt;a href="#introduction">Introduction&lt;/a>&lt;/li>
&lt;li>&lt;a href="#computer-puzzling-without-brute-force">Computer puzzling without brute-force&lt;/a>&lt;/li>
&lt;li>&lt;a href="#taking-a-step-back-puzzles-as-permutations">Taking a step back: puzzles as permutations&lt;/a>&lt;/li>
&lt;li>&lt;a href="#brute-force-still-ignorant-but-kinda-smart">Brute-force, still ignorant, but kinda smart?&lt;/a>
&lt;ol>
&lt;li>&lt;a href="#observation-1-decomposition-as-intersection">Observation #1: decomposition as intersection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#observation-2-sorting-really-helps">Observation #2: sorting really helps!&lt;/a>&lt;/li>
&lt;li>&lt;a href="#what-is-a-move">What is a move?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#what-is-a-word">What is a word?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#observation-3-sorting-as-solving">Observation #3: sorting as solving&lt;/a>&lt;/li>
&lt;li>&lt;a href="#more-splitting">More splitting?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#iterating-through-products-with-schroeppel--shamir">Iterating through products with Schroeppel&amp;ndash;Shamir&lt;/a>&lt;/li>
&lt;li>&lt;a href="#permutation-tries">Permutation tries&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-4-list-algorithm-and-solving-the-rubiks-cube">The 4-List Algorithm and solving the Rubik&amp;rsquo;s Cube&lt;/a>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>&lt;a href="#example-and-source-code">Example and source code&lt;/a>&lt;/li>
&lt;li>&lt;a href="#tips-for-optimizing-the-4-list-algorithm">Tips for optimizing the 4-List Algorithm&lt;/a>&lt;/li>
&lt;li>&lt;a href="#sample-benchmarks">Sample benchmarks&lt;/a>&lt;/li>
&lt;li>&lt;a href="#conclusion">Conclusion&lt;/a>&lt;/li>
&lt;li>&lt;a href="#references">References&lt;/a>&lt;/li>
&lt;/ol>
&lt;/nav>
&lt;hr>
&lt;/div>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>When I was about 13, while still a middle-schooler, I became
fascinated with the Rubik&amp;rsquo;s Cube&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>. I never got terribly good
at solving it, maybe eventually getting into the 30 to 40 seconds
range. While I didn&amp;rsquo;t have a penchant for memorizing move sequences, I
was drawn into how we &lt;em>find&lt;/em> these move sequences.&lt;/p>
&lt;p>The story about my interest and exploration in the Rubik&amp;rsquo;s Cube is for
another post. Long story short, I got interested in &amp;ldquo;computer
puzzling&amp;rdquo;&amp;mdash;using computers to manipulate combinatorial puzzles, like
the Rubik&amp;rsquo;s Cube, either to solve them quickly, to discover patterns,
or to find novel move sequences for use in speedcubing&amp;mdash;and ever
since, I&amp;rsquo;ve been working on different programs for solving Rubik-like
puzzles.&lt;/p>
&lt;p>Purely in principle, it shouldn&amp;rsquo;t be hard to solve a Rubik&amp;rsquo;s Cube with
a computer, right? Our program would have three parts:&lt;/p>
&lt;ol>
&lt;li>A model of the Rubik&amp;rsquo;s Cube, that is, some data structure that
represents a cube state.&lt;/li>
&lt;li>Some functions which can simulate turns of each side.&lt;/li>
&lt;li>A solving procedure which takes a scrambled cube, tries every
possible turn sequence, and stops when solved.&lt;/li>
&lt;/ol>
&lt;p>Truth be known, and details aside, this is a provably correct method
for solving a Rubik&amp;rsquo;s Cube. If you leave your computer on long enough,
it will return a solution.&lt;/p>
&lt;p>The problem is that it takes a long time. Probably longer than your
lifetime.&lt;/p>
&lt;h2 id="computer-puzzling-without-brute-force">Computer puzzling without brute-force&lt;/h2>
&lt;p>&amp;ldquo;Brute-force&amp;rdquo; generally means to try every possibility of something
without much of any strategy. Our method above is a brute-force
algorithm. Brute-force algorithms generally aren&amp;rsquo;t practical, because
if you have $N$ of something to explore, a brute-force algorithm will
take $O(N)$ time. For a Rubik&amp;rsquo;s Cube, $N$ is 43 quintillion&amp;mdash;a very
large number.&lt;/p>
&lt;p>It has been known, practically since the Rubik&amp;rsquo;s Cube&amp;rsquo;s inception,
that something else is needed to solve a Rubik&amp;rsquo;s Cube. Rubik&amp;rsquo;s Cube
solutions, obviously, take into account the specific structure and
properties of the cube so as to implicitly or explicitly avoid
mindless search. These methods have turned out to be:&lt;/p>
&lt;ol>
&lt;li>Solving methods for humans: memorize some sequences which let you
move only a few pieces around in isolation, and apply these
sequences mechanically until all pieces are in place. The more
sequences you memorize, the faster you&amp;rsquo;ll be.&lt;/li>
&lt;li>Heuristic tree search: do a tree search (with e.g.,
iterative-deepening depth-first search&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>), but aggressively
prune off branches by way of clever heuristics&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>.&lt;/li>
&lt;li>Phase-based solvers: a deeply mathematical way which involves
characterizing the Rubik&amp;rsquo;s Cube as a sequence of nested
(mathematical) subgroups so that each successive coset space small
enough that it can be solved by computer.&lt;/li>
&lt;/ol>
&lt;p>Computer puzzling mostly deals with the latter two approaches, usually
in some combination. Both approaches lead to extraordinarily
high-performing solvers. For example:&lt;/p>
&lt;ul>
&lt;li>Korf&amp;rsquo;s algorithm (approach #2) finds optimal solutions&amp;mdash;solutions
of shortest length&amp;mdash;but can take hours to find one.&lt;/li>
&lt;li>Thistlethwaite&amp;rsquo;s algorithm (approach #3) solves a cube in four
phases almost instantaneously. The solutions are guaranteed to be no
longer than triple the optimal length.&lt;/li>
&lt;/ul>
&lt;p>The story may as well end here. We have slow but optimal ways of
solving the Rubik&amp;rsquo;s Cube, and fast but sub-optimal ways. Pick your
poison (sub-optimal or slow), depending on what you&amp;rsquo;re trying to
achieve.&lt;/p>
&lt;h2 id="taking-a-step-back-puzzles-as-permutations">Taking a step back: puzzles as permutations&lt;/h2>
&lt;p>It seems that any Rubik&amp;rsquo;s Cube solver &lt;em>has&lt;/em> to know &lt;em>something&lt;/em> about
the structure of the cube. It might be worth asking how little
structure we can get away with, so as to make whatever solving
algorithm we write generic over a broad class of puzzles.&lt;/p>
&lt;p>For a brute-force algorithm with tree search, we would need something
like the following:&lt;/p>
&lt;pre tabindex="0">&lt;code>interface GenericPuzzle:
type State
type Move
function isSolved(State) -&amp;gt; Boolean
function allMoves() -&amp;gt; List(Move)
function performMove(State, Move) -&amp;gt; State
&lt;/code>&lt;/pre>&lt;p>With this, we could write the following solver based off of
iterative-deepening depth-first search, which is totally generic on
the above interface.&lt;/p>
&lt;pre tabindex="0">&lt;code>function solve(State) -&amp;gt; List(Move)
function solve(p):
if isSolved(p):
return []
for maxDepth from 1 to infinity:
solved?, solution = dfs(0, maxDepth, p)
if solved?:
return solution
function dfs(Integer, Integer, State, List(Move)) -&amp;gt; (Boolean, List(Move))
function dfs(depth, maxDepth, p, s):
if isSolved(p):
return (True, s)
if depth == maxDepth:
return (False, [])
for m in allMoves():
p' = performMove(p, m)
(solved?, solution) = dfs(depth+1, maxDepth, p', append(s, [m])
if solved?:
return (solved?, solution)
&lt;/code>&lt;/pre>&lt;p>As discussed before, while this strategy is effective for problems
with small search spaces, it&amp;rsquo;s no help when the space is
large. Unfortunately, the &lt;code>GenericPuzzle&lt;/code> interface doesn&amp;rsquo;t give us
much room for improvement. Can we still remain generic, while giving
us at least a little more room for exploring other algorithms?&lt;/p>
&lt;p>The answer is yes, if we restrict ourselves to &lt;em>permutation
puzzles&lt;/em>. Roughly speaking, a permutation puzzle is one where pieces
shift around according to a fixed and always available set of shifting
moves. The Rubik&amp;rsquo;s Cube is a phenomenal and non-trivial example: We
can label each mobile&lt;sup id="fnref:4">&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref">4&lt;/a>&lt;/sup> sticker with a number 1 to 48, and
these stickers can always be shifted around with a twist of any of the
six sides. Since we can twist any of the six sides at any time, the
puzzle is a permutation puzzle. (Not all similar puzzles are
permutation puzzles. There are some puzzles which are &amp;ldquo;bandaged&amp;rdquo;, that
is, pieces of the puzzle are fused together, restricting some
available moves depending on the configuration.)&lt;/p>
&lt;p>In this view, we looked at a solved configuration as a list of
numbers. For example, the solved Rubik&amp;rsquo;s Cube as a permutation would
be&lt;/p>
&lt;p>$$
(1, 2, \ldots, 47, 48).
$$&lt;/p>
&lt;p>When we turn a side, these numbers get permuted. For instance,
assuming a particular labeling of stickers with numbers, turning the
top face of a Rubik&amp;rsquo;s Cube might permute the first sticker in the list
to the third, the second sticker to the fifth, the third sticker to
the eighth, etc. We can use the same notation&lt;/p>
&lt;p>$$
(3, 5, 8, 2, 7, 1, \ldots)
$$&lt;/p>
&lt;p>This notation has two interpretations:&lt;/p>
&lt;ol>
&lt;li>The literal position of numbered stickers on a physical cube (with
an agreed upon labeling).&lt;/li>
&lt;li>An instruction for how to relabel the stickers of a given cube.&lt;/li>
&lt;/ol>
&lt;p>If we look at the notation under the second interpretation, a
permutation actually represents a &lt;em>function&lt;/em> that&amp;rsquo;s applied to
&lt;em>individual stickers&lt;/em>. For instance, if&lt;/p>
&lt;p>$$
F := (3, 5, 8, 2, 7, 1, \ldots)
$$&lt;/p>
&lt;p>then $F(1) = 3$, $F(2) = 5$, etc. All of the clockwise face
turns&amp;mdash;Front, Right, Up, Back, Left, Down&amp;mdash;of a Rubik&amp;rsquo;s Cube can be
described like so:&lt;/p>
&lt;p>$$
\begin{align*}
F &amp;amp;:= (1, 2, 3, 4, 5, 25, \ldots)\\
R &amp;amp;:= (1, 2, 38, 4, 36, 6, \ldots)\\
U &amp;amp;:= (3, 5, 8, 2, 7, 1, \ldots)\\
B &amp;amp;:= (14, 12, 9, 4, 5, 6, \ldots)\\
L &amp;amp;:= (17, 2, 3, 20, 5, 22, \ldots)\\
D &amp;amp;:= (1, 2, 3, 4, 5, 6, \ldots, 48, 42, 47, 41, 44, 46)
\end{align*}
$$&lt;/p>
&lt;p>We wrote some of the last elements of $D$ because a &amp;ldquo;down&amp;rdquo; move doesn&amp;rsquo;t
change the first six stickers in this labeling scheme.&lt;/p>
&lt;p>This gives is a whole new interpretation of what it means to &amp;ldquo;solve&amp;rdquo; a
cube. Given a scrambled cube, we first write down the permutation that
describes how the stickers moved from a solved state to the scrambled
state. Let&amp;rsquo;s call it $s$. This is easy, because we can just read the
labeled stickers off of a cube one-by-one, in order. For example, $s$
might be:&lt;/p>
&lt;p>$$
s := (27, 42, 30, 15, 39, 6, \ldots).
$$&lt;/p>
&lt;p>&lt;em>This is a description of a function!&lt;/em> The value of $s(1)$ describes
how the first sticker of a cube will be shifted to its scrambled
position, in this case $27$. Next, solving a cube is finding a
sequence of $k$ moves $m_1, m_2, \ldots, m_k$ such that, for all $1\leq
i\leq 48$,&lt;/p>
&lt;p>$$
i = m_k(m_{k-1}(\cdots(m_2(m_1(s(i)))))).
$$&lt;/p>
&lt;p>Stated another way in function composition notation, the function&lt;/p>
&lt;p>$$
m_k \circ m_{k-1} \circ \cdots \circ m_2 \circ m_1\circ s
$$&lt;/p>
&lt;p>must be the identity function&amp;mdash;a permutation that doesn&amp;rsquo;t move
anything.&lt;/p>
&lt;p>In the permutation puzzle way of thinking, we can still implement our
&lt;code>GenericPuzzle&lt;/code> interface:&lt;/p>
&lt;ul>
&lt;li>&lt;code>State&lt;/code> would be a permutation;&lt;/li>
&lt;li>&lt;code>Move&lt;/code> would also be a permutation;&lt;/li>
&lt;li>&lt;code>isSolved&lt;/code> would check if a permutation is $(1, 2, 3, \ldots)$;&lt;/li>
&lt;li>&lt;code>allMoves&lt;/code> would be a hard-coded list of the possible moves, like
$F$, $R$, $U$, $B$, $L$, and $D$ for the Rubik&amp;rsquo;s cube; and&lt;/li>
&lt;li>&lt;code>performMove&lt;/code> would take the input move permutation, and apply it as
a function to each element of the state permutation.&lt;/li>
&lt;/ul>
&lt;p>This might even be &lt;em>more&lt;/em> efficient than another choice of
representation, since permutations can be represented very efficiently
on a computer as packed arrays of bytes!&lt;/p>
&lt;p>But we didn&amp;rsquo;t do all this mathematical groundwork just to goof around;
there&amp;rsquo;s something amazing lurking in these permutations.&lt;/p>
&lt;h2 id="brute-force-still-ignorant-but-kinda-smart">Brute-force, still ignorant, but kinda smart?&lt;/h2>
&lt;p>In the late 1980s, Adi Shamir&lt;sup id="fnref:5">&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref">5&lt;/a>&lt;/sup> and his students made a
brilliant series of observations that came together to make for a
beautiful result. Unfortunately, to my knowledge, only two writings
exist on the topic.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Shamir and his colleagues wrote a paper about it [1], sort of in
the style of a brief conference proceeding, but it&amp;rsquo;s very light on
details and skips implementation considerations. It&amp;rsquo;s the kind of
paper where you follow it, but you have to fill in a great number
of blanks to make anything from it work.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Shamir gave a talk sometime in the 80&amp;rsquo;s about his result, and
somebody (none other than Alan Bawden) wrote a brief email [2] to a
mailing list about his recollection of it.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>An amazing result, buried in history, without any good exposition that
I could find.&lt;/p>
&lt;p>What&amp;rsquo;s the result? The essence of the result is this. Reminiscent of a
&amp;ldquo;meet in the middle&amp;rdquo; algorithm, if we want to brute-force a problem
that ordinarily requires visiting $N$ states to find an answer, we can
instead cleverly split the work into two searches that requires visits
to around $\sqrt{N}$ states. For a Rubik&amp;rsquo;s Cube, that cuts work
associated with 43 quintillion states, down to work associated with 6
billion states. The best part is, this is &lt;em>still brute-force&lt;/em>;
virtually no knowledge of the structure of the problem is required to
make it work.&lt;/p>
&lt;p>Let&amp;rsquo;s walk through the requisite steps and build up to the
result. I&amp;rsquo;ll attempt to write in a general framework (since it&amp;rsquo;s a
general algorithm), but make frequent appeals to the Rubik&amp;rsquo;s Cube
specifically.&lt;/p>
&lt;h3 id="observation-1-decomposition-as-intersection">Observation #1: decomposition as intersection&lt;/h3>
&lt;p>Suppose the following:&lt;/p>
&lt;ul>
&lt;li>We have a mysterious permutation $s$, say, a scrambled puzzle;&lt;/li>
&lt;li>We have two sets of permutations $X$ and $Y$; and&lt;/li>
&lt;li>We assume there&amp;rsquo;s an $\hat x\in X$ and $\hat y\in Y$ such that $s =
\hat y\circ \hat x$.&lt;/li>
&lt;/ul>
&lt;p>The goal is to find precisely $\hat x$ and $\hat y$ are. The simplest
way to do this is to check every combination of elements in $X$ and
$Y$.&lt;/p>
&lt;pre tabindex="0">&lt;code>for x in X:
for y in Y:
when s = compose(y, x):
return (x, y)
&lt;/code>&lt;/pre>&lt;p>This will take time proportional to the product of the set sizes:
$O(\vert X\vert\cdot\vert Y\vert)$. Shamir noticed the following: If
$s=\hat y\circ\hat x$, then $\hat y^{-1}\circ s = \hat x$. With this, we
preprocess our $Y$ set to be instead&lt;/p>
&lt;p>$$
Y' := \{y^{-1}\circ s : y\in Y\}.
$$&lt;/p>
&lt;p>By doing this, there must be an element in common between $X$ and
$Y'$, since $\hat x\in X$ and $\hat y^{-1}\circ s\in Y'$ and those are
equal. So we&amp;rsquo;ve reduced the problem to determining what the
intersection between $X$ and $Y'$ is.&lt;/p>
&lt;p>Once we find our $z$ which is in common with $X$ and $Y'$, then our
recovered permutation will be $\hat x = z$ and $\hat y = (z\circ
s^{-1})^{-1}$.&lt;/p>
&lt;p>We&amp;rsquo;ve just established that the problem of decomposing an element like
$s$ is identical to the problem of calculating a set
intersection. Still, if we want to do the intersection, our intuition
tells us we still need a quadratic algorithm, which brings us to the
second observation.&lt;/p>
&lt;h3 id="observation-2-sorting-really-helps">Observation #2: sorting really helps!&lt;/h3>
&lt;p>Permutations have a natural ordering, called &lt;em>lexicographic
ordering&lt;/em>. If you have two permutations, and you read their elements
left-to-right, you can compare them like ordinary numbers. Just
as $123 &amp;lt; 213$, we can say that&lt;/p>
&lt;p>$$
(1,2,3) &amp;lt; (2,1,3).
$$&lt;/p>
&lt;p>A nice property of this is that the identity permutation $(1, 2, 3,
\ldots)$ is the smallest permutation of a given size.&lt;/p>
&lt;p>How does this help us? Well, suppose we sort our sets $X$ and $Y'$
into lists $L_X$ and $L_{Y'}$, so the permutations are in order. If
$L_X$ and $L_{Y'}$ have an element in common, we can find it in linear
time: $O(\min\{\vert X\vert, \vert Y'\vert\})$. How? Something like
the following:&lt;/p>
&lt;pre tabindex="0">&lt;code>function findCommon(Lx, Ly):
x = pop(Lx)
y = pop(Ly)
loop:
if x == y:
return x
if empty(Lx) or empty(Ly):
error(&amp;quot;No common elements found.&amp;quot;)
if x &amp;lt; y:
x = pop(Lx)
else if x &amp;gt; y:
y = pop(Ly)
&lt;/code>&lt;/pre>&lt;p>This works because we are essentially looking at all of the elements
of $L_X$ and $L_{Y'}$ together in sorted order. It&amp;rsquo;s like a merge
sort, without the merge part.&lt;/p>
&lt;p>As written, &lt;code>findCommon&lt;/code> computes just one element of the intersection.
Instead of returning, the loop could continue to enumerate all elements.
This is useful to know for the purpose of solving permutation puzzles:
Do we want just some solution, or do we want all solutions? That answer,
of course, depends on the application.&lt;/p>
&lt;p>Before continuing, we should take a little scenic tour on a more
formal meaning of &amp;ldquo;moves&amp;rdquo; and &amp;ldquo;move sequences&amp;rdquo;, since ultimately any
permutation puzzle solving algorithm must produce them as output.&lt;/p>
&lt;h3 id="what-is-a-move">What is a move?&lt;/h3>
&lt;p>A quick bit about notation. If we have a permutation $f$, then its
inverse is written $f^{-1}$, and it&amp;rsquo;s $k$-fold repetition $f\circ
f\circ\cdots\circ f$ is written $f^k$. If we have a collection of
permutations $S := \{f_1, f_2, \ldots\}$, then we write the
following shorthands:&lt;/p>
&lt;p>$$
\begin{align*}
S^{-1} &amp;amp;:= \{f^{-1} : f \in S\}\\
S^{\times k} &amp;amp;:= \{f^k : f \in S\}.
\end{align*}
$$&lt;/p>
&lt;p>If $g$ is some permutation, we also write these shorthands:&lt;/p>
&lt;p>$$
\begin{align*}
g\circ S &amp;amp;:= \{g\circ f : f \in S\}\\
S\circ g &amp;amp;:= \{f\circ g : f \in S\}.
\end{align*}
$$&lt;/p>
&lt;p>Similarly, if $T := \{g_1, g_2, \ldots\}$, then we can write&lt;/p>
&lt;p>$$
\begin{align*}
S\circ T &amp;amp;:= \{f\circ g : f\in S, g\in T\}\\
&amp;amp;= \{f_1\circ g_1, f_2\circ g_1, \ldots, f_1\circ g_2, \ldots\}.
\end{align*}
$$&lt;/p>
&lt;p>With that out of the way, let&amp;rsquo;s talk about the concept of a single
&amp;ldquo;move&amp;rdquo;. What counts as a &amp;ldquo;move&amp;rdquo; in a permutation puzzle?&lt;/p>
&lt;p>Really, we can choose any set of moves we please, so long as every
state of the puzzle is reachable through some combination of the
moves. For example, let&lt;/p>
&lt;p>$$
C := \{F, R, U, B, L, D\},
$$&lt;/p>
&lt;p>the basic and well understood ninety-degree clockwise moves of the
Rubik&amp;rsquo;s Cube. Indeed, $C$ itself is a fine definition of available
moves. All of the following are also valid definitions of moves:&lt;/p>
&lt;p>$$
C\cup C^{-1},\quad C\cup C^{\times 2},\quad C^{-1},\quad C\cup C^{\times 2}\cup C^{-1},
$$&lt;/p>
&lt;p>and so on. Perhaps surprisingly, we can take any element of $C$ and
remove it, and it would still be a valid set of moves for the Rubik&amp;rsquo;s
Cube&lt;sup id="fnref:6">&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref">6&lt;/a>&lt;/sup>!&lt;/p>
&lt;p>Which set of moves we select usually has little relevance
mathematically (they are all expressible as one another), but has
great relevance when we are synthesizing efficient move sequences, or when
we want to talk about &amp;ldquo;optimality&amp;rdquo;. For instance, consider a
counterclockwise move: $F^{-1}$. It&amp;rsquo;s natural to consider this a
single move, but if we consider our set to be $C$, then we&amp;rsquo;d have to
count it as three moves, since $F^{-1} = F\circ F\circ F = F^3$. What
about $F^2$? Is that one move or two? Speedcubers generally consider
$F^2$ to be one motion, so counting that as one move is natural, but
many computer puzzlers like the simplicity of $C\cup C^{-1}$, i.e.,
only ninety-degree turns&lt;sup id="fnref:7">&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref">7&lt;/a>&lt;/sup>.&lt;/p>
&lt;p>For the rest of this note, we&amp;rsquo;ll be in the former camp, where half-turns count as one, and we&amp;rsquo;ll denote this set of moves as:&lt;/p>
&lt;p>$$
\bar C := C \cup C^{-1} \cup C^{\times 2}.
$$&lt;/p>
&lt;h3 id="what-is-a-word">What is a word?&lt;/h3>
&lt;p>After we agree on what we consider a move, we can be more specific as
to what we mean about move sequences. A &lt;em>move sequence&lt;/em> is a possibly
empty list of moves. A move sequence can be &lt;em>composed&lt;/em> to form the
permutation it represents. This composition operator is called
$\kappa$, and is easily defined. Let $M$ be a move set, and let $s =
[s_1, s_2, \ldots, s_n]$ be a sequence of $n$ moves with each
$s_{\bullet}$ a move from $M$. The &lt;em>length&lt;/em> of $s$ is naturally $n$,
and its composition is defined as:&lt;/p>
&lt;p>$$
\begin{align*}
\kappa([\,]) &amp;amp;:= (1, 2, 3, \ldots)\\
\kappa([s_1, s_2, \ldots, s_{n-1}, s_n]) &amp;amp;:= \kappa([s_1, s_2, \ldots, s_{n-1}])\circ s_n.
\end{align*}
$$&lt;/p>
&lt;p>If $M$ is a move set, then the set of all move sequences (including
the empty sequence) is denoted $M^{*}$, a notation kindly borrowed
from formal language theory.&lt;/p>
&lt;p>If we identify the elements of $M$ with symbols, then a move sequence
is called a &lt;em>word&lt;/em>. We&amp;rsquo;ll always type symbols in $\texttt{typewriter}$
font. The moves $\{F, R, U, B, L, D\}$ have the symbols
$\{\texttt{F}, \texttt{R}, \texttt{U}, \texttt{B}, \texttt{L},
\texttt{D}\}$, an inverse $F^{-1}$ has the symbol $\texttt{F'}$, and
a square $F^2$ has the symbol $\texttt{F2}$. And we type words as
symbols joined together in &lt;em>reverse&lt;/em> order&lt;sup id="fnref:8">&lt;a href="#fn:8" class="footnote-ref" role="doc-noteref">8&lt;/a>&lt;/sup>, so $[R^{-1},
U^2, L]$ can be represented by the word $\texttt{L U2 R'}$.&lt;/p>
&lt;p>The distinction is subtle but important. In a computer program, a move
sequence is a list of permutations, while a word is a list of
symbols. A Rubik&amp;rsquo;s Cube solving program should take as input a
permutation, and output a word which when composed as a move sequence,
brings that permutation to identity.&lt;/p>
&lt;p>When doing math, we often mix up all of these concepts since they have
little bearing on the correctness of an argument. Whether it&amp;rsquo;s the
permutation $F\circ R^{-1}$ or the move sequence $[F, R^{-1}]$ or the
word $\texttt{R' F}$ or otherwise, they all represent roughly
the same thing, but computers need to be explicit about which
representation is being manipulated.&lt;/p>
&lt;p>So, in summary:&lt;/p>
&lt;ul>
&lt;li>A &lt;strong>move set&lt;/strong> is a set of permutations that &amp;ldquo;count&amp;rdquo; as one move.&lt;/li>
&lt;li>A &lt;strong>move sequence&lt;/strong> is a list of moves from a move set.&lt;/li>
&lt;li>The &lt;strong>composition&lt;/strong> of a move sequence is the permutation that move sequence represents.&lt;/li>
&lt;li>A &lt;strong>symbol&lt;/strong> is a designator for a move in a move set.&lt;/li>
&lt;li>A &lt;strong>word&lt;/strong> is a sequence of symbols.&lt;/li>
&lt;/ul>
&lt;p>Back to this brute-force thing&amp;hellip;&lt;/p>
&lt;h3 id="observation-3-sorting-as-solving">Observation #3: sorting as solving&lt;/h3>
&lt;p>As silly as the example is, let&amp;rsquo;s suppose we know, for a fact, that a
Rubik&amp;rsquo;s Cube was mixed up using only six moves from $\bar C$. Since
$\bar C$ has 18 elements, without any optimization, we might have to
try $18^6$ move sequences to find a solution.&lt;/p>
&lt;p>Instead of brute-forcing in that way, we can do another trick. Let &lt;code>s&lt;/code>
be our scrambled permutation.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Write out every combination of 3 moves into a table. The key would
be the permutation, and the value would be the word associated with
that permutation. Call this table &lt;code>A&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Sort &lt;code>A&lt;/code> in ascending lexicographic order on the permutation.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make a copy of &lt;code>A&lt;/code>, call it &lt;code>B&lt;/code>. For all &lt;code>(perm, word)&lt;/code> in &lt;code>B&lt;/code>,
reassign &lt;code>perm := compose(invert(perm), s)&lt;/code>. We do this because of
Observation #1.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Sort &lt;code>B&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Call &lt;code>x := findCommon(A, B)&lt;/code>. We do this via Observation #2.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Reconstruct a word equal to &lt;code>s&lt;/code> by &lt;code>A[x].word ++ reverse(B[x].word)&lt;/code>. We do this to recover a final result via Observation
#1.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Since we have a word that brings us &lt;em>from solved to &lt;code>s&lt;/code>&lt;/em>, we can
invert the word to bring us &lt;em>from &lt;code>s&lt;/code> to solved&lt;/em>.&lt;/p>
&lt;p>By this method, we avoided visiting all $16^6$ move sequences by
instead pre-calculating two groups of $16^3$ sequences and exploring
them for an intersection. We have cut the amount of work down to its
square root.&lt;/p>
&lt;p>If we generalize to length $n+m$ (for some splitting of $n$ and $m$),
then we can replace the work of visiting $16^{n+m}$ states with
$16^m + 16^n$ states, which is much better.&lt;/p>
&lt;p>So we&amp;rsquo;re done? We now know that the Rubik&amp;rsquo;s Cube requires no more than
20 moves, so if we make two tables enumerating 10 moves, we should be
good?&lt;/p>
&lt;p>Well, err, $16^{10} = 1,099,511,627,776$. Unless we have trillions of
resources to space, be it time or space, it&amp;rsquo;s still not going to work.&lt;/p>
&lt;h3 id="more-splitting">More splitting?&lt;/h3>
&lt;p>An enterprising computer science student, at this point, might smell
recursion. If we split once, can we split again? If we know a Rubik&amp;rsquo;s
Cube can be solved in 20 moves, can we split it into two 10 move
problems, and each of those into two 5 move problems?&lt;/p>
&lt;p>The problem with this is that at the top layer of recursion, it&amp;rsquo;s
clear what we are solving. At lower layers, it&amp;rsquo;s no longer clear. What
&lt;em>actually&lt;/em> is the recursive structure at play? And if we could do this
trick, couldn&amp;rsquo;t we decimate any brute-force problem of exponential
complexity (e.g., in number of moves) into one of linear?&lt;/p>
&lt;p>That isn&amp;rsquo;t going to work, but we can be inspired by it. Let $L$ be the
set of 5-or-fewer-move combinations from $\bar C$, that is,&lt;/p>
&lt;p>$$
L := \bigcup_{i=0}^5 \bar C^i.
$$&lt;/p>
&lt;p>The size of $L$ is going to be $621,649$ if we don&amp;rsquo;t store redundant
permutations. This is definitely possible to compute. Then our goal is
to find a decomposition of $s$ in terms of an element in $L\circ
L\circ L\circ L$. Using the same trick from Observation #1, suppose
there is a decomposition $$s = l_4\circ l_3\circ l_2\circ l_1.$$ Then
$$l_3^{-1}\circ l_4^{-1} \circ s = l_2\circ l_1.$$ So we create four
tables:&lt;/p>
&lt;ul>
&lt;li>$L_1 = L$,&lt;/li>
&lt;li>$L_2 = L_1$,&lt;/li>
&lt;li>$L_4 = L_1^{-1}$, and&lt;/li>
&lt;li>$L_3 = L_4\circ s$.&lt;/li>
&lt;/ul>
&lt;p>No, the $4$ before $3$ is not a typo! We put this in order to save on
computation and avoid redundant work. Now our goal is to find an
element in common between the two sets&lt;/p>
&lt;p>$$
\begin{align*}
X &amp;amp;= L_2 \circ L_1\\
Y &amp;amp;= L_4 \circ L_3.
\end{align*}
$$&lt;/p>
&lt;p>Somehow, we must do this without actually calculating all elements of
$L_i\circ L_j$. And, to add insult to injury, for &lt;code>findCommon&lt;/code> to
work, we need to be able to go through the set in sorted order.&lt;/p>
&lt;h3 id="iterating-through-products-with-schroeppel--shamir">Iterating through products with Schroeppel&amp;ndash;Shamir&lt;/h3>
&lt;p>Suppose we have two lists of positive numbers $A$ and $B$. How can we
print the elements of $\{a+b : a\in A, b\in B\}$ in numerical order
without explicitly constructing and sorting this set? Shamir and his
collaborator Schroeppel did so with the following algorithm.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Sort $A$ in ascending order. Pop off the first (and therefore
smallest) element $a_1$.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Create a priority queue $Q$ and initialize it with $(a,b)$ with
priority $a_1 + b$ for all $b\in B$.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Repeat the following until $Q$ is empty:&lt;/p>
&lt;ol>
&lt;li>Pop $(a,b)$ off $Q$. This will form the next smallest sum, so print $a+b$.&lt;/li>
&lt;li>Find $a'$ which immediately succeeds $a$ in our sorted list $A$.&lt;/li>
&lt;li>Push $(a',b)$ with priority $a+b$ onto $Q$.&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ol>
&lt;p>This algorithm will terminate, having printed each sum successively
with at most $O(\vert A\vert + \vert B\vert)$ space and almost linear
time. (The sorting and priority queue maintenance require some
logarithmic factors.)&lt;/p>
&lt;p>With a little work, one can see why this works. In a sense it&amp;rsquo;s a
two-dimensional sorting problem, that depends on one crucial fact: If
$x \le y$ then $x+z \le y+z$. (This is to say that addition is
&lt;em>monotonic&lt;/em>.) Given how the priority queue is constructed, it will
&lt;em>always&lt;/em> contain the smallest sum.&lt;/p>
&lt;p>Could we do this with permutations? If we have two lists of
permutations $A$ and $B$, and $a_1$ is the &amp;ldquo;smallest&amp;rdquo; (i.e.,
lexicographically least) permutation of $A$, and $b_1$ is the
&amp;ldquo;smallest&amp;rdquo; permutation of $B$, then it is &lt;strong>patently not true&lt;/strong> that
$a_1\circ b_1$ is the smallest element of $A\circ B$. In symbols,&lt;/p>
&lt;p>$$
(\min A) \circ (\min B) \neq \min (A\circ B).
$$&lt;/p>
&lt;p>Similarly, if two permutations satisfy $a &amp;lt; b$, then it is &lt;strong>patently
not true&lt;/strong> that&lt;/p>
&lt;p>$$
a\circ z &amp;lt; b\circ z
$$&lt;/p>
&lt;p>for a permutation $z$.&lt;/p>
&lt;p>The monotonicity of addition is what allows us to do steps 3.2 and 3.3
so easily. If we did the same with permutations, we would no longer
have the guarantee that the minimum composition exists within the
queue.&lt;/p>
&lt;p>This was the next hurdle Shamir cleared. Constant in the size of $A$
or $B$, Shamir found a way to solve the following problem: Given a
permutation $a\in A$ and $b\in B$, find the element $b'\in B$ such
that $a\circ b'$ immediately succeeds $a\circ b$. In other words, we
can generate, one-by-one, a sequence of $b$&amp;rsquo;s needed for step 3.2 and
3.3. With this algorithm (which we&amp;rsquo;ll describe in the next section),
our Shamir&amp;ndash;Schroeppel algorithm for permutations becomes the
following:&lt;/p>
&lt;p>&lt;strong>Algorithm (Walk Products)&lt;/strong>:&lt;/p>
&lt;ol>
&lt;li>Initialize an empty priority queue $Q$ whose elements are pairs of
permutations with priority determined by another permutation in
lexicographic ordering.&lt;/li>
&lt;li>For each permutation $b\in B$:
&lt;ol>
&lt;li>With Shamir&amp;rsquo;s trick, find the $a\in A$ such that $a\circ b = \min (A\circ b)$.&lt;/li>
&lt;li>Push $(a, b)$ onto $Q$ with priority $a\circ b$.&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>(Invariant: At this point, we will certainly have $\min (A\circ B)$ in the queue.)&lt;/li>
&lt;/ul>
&lt;ol start="3">
&lt;li>Repeat the following until $Q$ is empty:
&lt;ol>
&lt;li>Pop $(a,b)$ off $Q$. This will form the next smallest $a\circ b$, so print it&lt;sup id="fnref:9">&lt;a href="#fn:9" class="footnote-ref" role="doc-noteref">9&lt;/a>&lt;/sup>.&lt;/li>
&lt;li>With Shamir&amp;rsquo;s trick, find $a'$ such that $a'\circ b$ immediately succeeds $a\circ b$.&lt;/li>
&lt;li>Push $(a',b)$ with priority $a'\circ b$ onto $Q$.&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ol>
&lt;p>This algorithm will produce the elements of $A\circ B$, one-by-one in
lexicographic order.&lt;/p>
&lt;p>What is Shamir&amp;rsquo;s trick? We need a data structure and a clever observation.&lt;/p>
&lt;h3 id="permutation-tries">Permutation tries&lt;/h3>
&lt;p>In order to handle sets of ordered permutations better, Shamir created
a data structure. I call it a permutation trie. A &lt;em>permutation trie&lt;/em>
of size-$k$ permutations is a $k$-deep, $k$-ary tree, such that a
path from root-to-leaf follows the elements of a permutation. The leaf
contains data which we want to associate with the permutation.&lt;/p>
&lt;p>For example, consider permutations of size $5$. Suppose we wanted to
associate the symbol $\texttt{p6}$ with the permutation
$(2,4,1,3,5)$. Then we would have a $5$-layer tree with a root node
$R$, such that $R[2][4][1][3][5] = \texttt{p6}$.&lt;/p>
&lt;p>More generally, let&amp;rsquo;s associate the following symbols with the
following permutations in a permutation trie $R$:&lt;/p>
&lt;p>$$
\begin{align*}
\texttt{p1} &amp;amp;\leftarrow (1,2,3,4,5) &amp;amp; \texttt{p2} &amp;amp;\leftarrow (1,2,3,5,4) &amp;amp; \texttt{p3} &amp;amp;\leftarrow (1,2,4,3,5)\\
\texttt{p4} &amp;amp;\leftarrow (1,2,5,3,4) &amp;amp; \texttt{p5} &amp;amp;\leftarrow (1,3,4,5,2) &amp;amp; \texttt{p6} &amp;amp;\leftarrow (2,4,1,3,5)\\
\texttt{p7} &amp;amp;\leftarrow (4,1,3,2,5) &amp;amp; \texttt{p8} &amp;amp;\leftarrow (4,1,3,5,2) &amp;amp; \texttt{p9} &amp;amp;\leftarrow (5,1,2,3,4)\\
\end{align*}
$$&lt;/p>
&lt;p>The trie would be a data structure that looks like this:&lt;/p>
&lt;div style="text-align: center;">
&lt;img
src="images/perm-trie.svg"
alt="An example permutatioen trie."
decoding="async"
/>
&lt;/div>
&lt;p>Even though we don&amp;rsquo;t show them, conceptually, each node in the trie
has a full length-$5$ array, with some elements empty (i.e., there are
no children).&lt;/p>
&lt;p>What&amp;rsquo;s good about this data structure? First and foremost, pre-order
traversal will visit the permutations in lexicographic order. We can
use this data structure to store two things at the leaves (i.e.,
$\texttt{p}n$):&lt;/p>
&lt;ol>
&lt;li>The actual permutation data structure representing that path, and&lt;/li>
&lt;li>The word we used to construct that permutation.&lt;/li>
&lt;/ol>
&lt;p>This is the data structure, and now we get to Shamir&amp;rsquo;s
insight. Suppose we have a permutation $s$ and a permutation trie $R$
(which represents a set of permutations), and we want to traverse
$s\circ R$ in lexicographic order. The naive way is to construct a new
trie, but we wish to avoid that. To explain the idea, we&amp;rsquo;ll choose a
concrete example.&lt;/p>
&lt;p>Let&amp;rsquo;s use $R$ from above. Let $s := (3,1,4,2,5)$. (Note that $s\not\in
R$, but that&amp;rsquo;s not important.) We wish to find an $r'\in R$ such that
$s\circ r' = \min (s\circ R)$. Well, the smallest permutation would be
one such that $r'(1) = 2$, because then $s(r'(1)) = s(2) = 1$. Looking
at our trie $R$, we can see the only candidate is that associated with
$\texttt{p6}$: $(2,4,1,3,5)$, which is the minimum.&lt;/p>
&lt;p>What about the next smallest $s\circ r''$? For ease, let&amp;rsquo;s call this
product $m$. We would want a permutatation such that $r''(1) = 4$,
because $m(1) = s(r''(1)) = s(1) = 2$. This time, there are two
candidates:&lt;/p>
&lt;p>$$
(4,1,3,2,5)\qquad (4,1,3,5,2)
$$&lt;/p>
&lt;p>So at least we know $m = (2, \ldots)$. To disambiguate, we need to
look at $r''(2)$. These are the same, likewise $r''(3)$, so we have no
degree of freedom at $2$ or $3$ to minimize the product. Thus $m = (2,
3, 4, \ldots)$. We have a choice at $r''(4)$, however. The best choice
is $r''(4) = 2$, because $m(4) = s(r''(4)) = s(2) = 1$, the smallest
possible choice. This disambiguates our choice of $r''$ to be
$(4,1,3,2,5)$ so that $m = (2,3,4,1,5)$.&lt;/p>
&lt;p>We could repeat the procedure to find the next smallest product
$s\circ r'''$. What exactly is the procedure here? Well, we walked
down the tree $R$, but instead of walking down it straight, we instead
did so in a permuted order based on $s$&amp;mdash;specifically
$s^{-1}$. Consider our normal algorithm for walking the tree&lt;sup id="fnref:10">&lt;a href="#fn:10" class="footnote-ref" role="doc-noteref">10&lt;/a>&lt;/sup> in
lexicographic order:&lt;/p>
&lt;pre tabindex="0">&lt;code>function walkLex(R):
if notTree(R):
print R
else:
for i from 1 to length(R):
if R[i] exists:
walkLex(R[i])
&lt;/code>&lt;/pre>&lt;p>We can instead walk in &lt;em>permuted&lt;/em> order, so that we produce a sequence
$[r, r'', r''', \ldots]$ such that&lt;/p>
&lt;p>$$
s\circ r &amp;lt; s \circ r' &amp;lt; s \circ r''' &amp;lt; \cdots,
$$&lt;/p>
&lt;p>we modify our walking algorithm as so:&lt;/p>
&lt;pre tabindex="0">&lt;code>function walkProductLex(R, s):
walk'(R, inverse(s))
function walk'(R, s):
if notTree(R):
print R
else:
for i from 1 to length(R):
j = s(i)
if R[j] exists:
walk'(R[j], s)
&lt;/code>&lt;/pre>&lt;p>Note that $s$ was inverted before the recursion to make quick permuting of each node.&lt;/p>
&lt;p>With this, we have the remarkable ability to iterate through products
in lexicographic order, without having to enumerate them all and sort
them. This was the last and critical ingredient.&lt;/p>
&lt;h3 id="the-4-list-algorithm-and-solving-the-rubiks-cube">The 4-List Algorithm and solving the Rubik&amp;rsquo;s Cube&lt;/h3>
&lt;p>Now we want to put this all together to create the &lt;em>4-List
Algorithm&lt;/em>. Let&amp;rsquo;s restate the problem in clear terms.&lt;/p>
&lt;p>&lt;strong>Problem (4-List)&lt;/strong>: Let $s$ be a permutation. Let $L_1$, $L_2$,
$L_3$, and $L_4$ be sets of permutations such that we know $s\in
L_4\circ L_3\circ L_2\circ L_1$. Find $l_1\in L_1$, $l_2\in L_2$,
$l_3\in L_3$, and $l_4\in L_4$ such that $s = l_4\circ l_3\circ
l_2\circ l_1$.&lt;/p>
&lt;p>Piecing together the elements above, we arrive at the 4-List Algorithm.&lt;/p>
&lt;p>&lt;strong>Algorithm (4-List)&lt;/strong>:&lt;/p>
&lt;ol>
&lt;li>Construct $L'_3 := L_3^{-1}\circ s$ and $L'_4 := L_4^{-1}$.&lt;/li>
&lt;li>Create two generators&lt;sup id="fnref:11">&lt;a href="#fn:11" class="footnote-ref" role="doc-noteref">11&lt;/a>&lt;/sup>: $X_1$ that walks $L_2\circ L_1$ in
lexicographic order, and $X_2$ that walks $L'_3\circ L'_4$ in
lexicographic order. Do this by using the &lt;strong>Walk Products&lt;/strong>
algorithm, which itself is implemented by constructing permutation
tries and using &lt;code>walkProductLex&lt;/code>.&lt;/li>
&lt;li>Call &lt;code>findCommon&lt;/code> on $X_2$ and $X_1$. This is guaranteed to find a
solution $(l_3^{-1},l_4^{-1}\circ s,l_2,l_1)$. Process the solution
to return $(l_4, l_3, l_2, l_1)$.&lt;/li>
&lt;/ol>
&lt;p>The main difficulty of this algorithm, aside from implementing each
subroutine correctly, is plumbing the right data around.&lt;/p>
&lt;p>Now, we can use this to solve a scrambled Rubik&amp;rsquo;s Cube $s$.&lt;/p>
&lt;p>&lt;strong>Algorithm (Solve Cube)&lt;/strong>:&lt;/p>
&lt;ol>
&lt;li>Let $L := \bigcup_{i=0}^5\bar C^i$, keeping a record of the words used to construct
each element of $L$. (We recommend immediately making a permutation
trie, where the leaves store the words.)&lt;/li>
&lt;li>Apply the &lt;strong>4-List Algorithm&lt;/strong> to the problem $s \in L\circ L\circ
L\circ L$ to emit $(l_4, l_3, l_2, l_1)$.&lt;/li>
&lt;li>Return words $(w_4, w_3, w_2, w_1)$ associated with the
permutations $(l_4, l_3, l_2, l_1)$.&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>(Invariant: The length of the solutions will be at most $20$.)&lt;/li>
&lt;/ul>
&lt;p>The algorithm may terminate upon finding just the first solution,
or may be continuously run to find all solutions. If we do the latter, we
guarantee finding all optimal solutions.&lt;/p>
&lt;p>Amazingly, this algorithm really works, and answers our blog post
question in the affirmative: &lt;em>yes, the Rubik&amp;rsquo;s Cube can be
brute-forced&lt;/em>.&lt;/p>
&lt;h2 id="example-and-source-code">Example and source code&lt;/h2>
&lt;p>This algorithm is implemented in Common Lisp, in my computational
group theory package
&lt;a href="https://github.com/stylewarning/cl-permutation">CL-PERMUTATION&lt;/a>. CL-PERMUTATION
already has built in support for Rubik&amp;rsquo;s Cubes as permutation
groups. Starting a new Common Lisp session, we have the following:&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;gt; (ql:quickload '(:cl-permutation :cl-permutation-examples))
&amp;gt; (in-package :cl-permutation)
&amp;gt; (group-order (perm-examples:make-rubik-3x3))
43252003274489856000
&amp;gt; (format t &amp;quot;~R&amp;quot; *)
forty-three quintillion two hundred fifty-two quadrillion three trillion
two hundred seventy-four billion four hundred eighty-nine million
eight hundred fifty-six thousand
NIL
&lt;/code>&lt;/pre>&lt;p>The built-in Rubik&amp;rsquo;s Cube model only uses $\{F, R, U, B, L, D\}$, so
we make new generators corresponding to $\bar C$.&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;gt; (defvar *c (loop :with cube := (perm-examples:make-rubik-3x3)
:for g :in (perm-group.generators cube)
:collect (perm-expt g 1)
:collect (perm-expt g 2)
:collect (perm-expt g 3)))
*C
&amp;gt; (length *c)
18
&lt;/code>&lt;/pre>&lt;p>Now we construct $\bar C^5 \cup \bar C^4 \cup \cdots \cup \bar C^0$.&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;gt; (defvar *c5 (generate-words-of-bounded-length *c 5))
*C5
&amp;gt; (perm-tree-num-elements *c5)
621649
&lt;/code>&lt;/pre>&lt;p>Note that this constructs a &lt;code>perm-tree&lt;/code> object, which automatically
stores the words associated with each permutation generated.&lt;/p>
&lt;p>Now let&amp;rsquo;s generate a random element of the cube group.&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;gt; (defvar *s (random-group-element (perm-examples:make-rubik-3x3)))
*S
&amp;gt; *s
#&amp;lt;PERM 43 44 41 20 47 11 28 9 24 13 17 42 36 40 37 25 6 21 1 29 7 19 10 3 35 39 22 18 34 33 31 48 16 15 30 2 23 32 26 46 8 4 27 12 45 14 5 38&amp;gt;
&lt;/code>&lt;/pre>&lt;p>Lastly, we run the 4-list algorithm and wait.&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;gt; (decompose-by-4-list *s *c5 *c5 *c5 *c5 :verbose t)
10,000,000: 52 sec @ 192,553 perms/sec; .0013% complete, eta 1114 hours 58 minutes
20,000,000: 48 sec @ 206,858 perms/sec; .0026% complete, eta 1037 hours 51 minutes
Evaluation took:
145.094 seconds of real time
145.097120 seconds of total run time (144.961382 user, 0.135738 system)
[ Run times consist of 2.405 seconds GC time, and 142.693 seconds non-GC time. ]
100.00% CPU
421,375,385,955 processor cycles
11,681,934,352 bytes consed
((8 11 14 2 4)
(1 16 9 15 1)
(7 6 18 8 15)
(9 13 16 15 8))
&lt;/code>&lt;/pre>&lt;p>We are pretty lucky this one ended in a mere 2 minutes 25 seconds! It
usually isn&amp;rsquo;t so prompt with an answer.&lt;/p>
&lt;p>The results are printed as four words: our $l_4$, $l_3$, $l_2$, and
$l_1$. Each integer $n$ represents the 1-indexed $n$th permutation of
$\bar C$ (ordered by how it was constructed). We can create a more
traditional notation:&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;gt; (defvar *solution (reduce #'append *))
*SOLUTION
&amp;gt; (defun notation (ws)
(dolist (w (reverse ws))
(multiple-value-bind (move order)
(floor (1- w) 3)
(format t &amp;quot;~C~[~;2~;'~] &amp;quot;
(aref &amp;quot;FRUBLD&amp;quot; move)
order))))
NOTATION
&amp;gt; (notation *solution)
U2 L' D L U' L' U2 D' R' U F L' U' D F R F2 L2 B2 U2
&lt;/code>&lt;/pre>&lt;p>How do we know if this is correct? We need to check that the
composition of this word equals our random element, which we do by
composing the word (using something CL-PERMUTATION calls a &amp;ldquo;free-group
homomorphism&amp;rdquo;), inverting the permutation, and composing it with our
scramble to see that it brings us to an identity permutation.&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;gt; (defvar *hom (free-group-&amp;gt;perm-group-homomorphism
(make-free-group 18)
(generate-perm-group *c)))
*HOM
&amp;gt; (perm-compose (perm-inverse (funcall *hom *solution)) *s)
#&amp;lt;PERM 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48&amp;gt;
&lt;/code>&lt;/pre>&lt;p>Indeed, we found a reconstruction of our cube.&lt;/p>
&lt;h2 id="tips-for-optimizing-the-4-list-algorithm">Tips for optimizing the 4-List Algorithm&lt;/h2>
&lt;p>One of the most troubling aspects of implementing this algorithm is
making it fast enough. My initial implementation worked at a whopping
200 permutations per second. That&amp;rsquo;s incredibly slow, and meant that it
would take well over a century (in the worst case) for my program to
finish. Now, it works at about 190,000 permutations per second, with
an estimated worst-case search time of 2 months. (I haven&amp;rsquo;t
encountered a scrambled cube position which has taken more than 10
hours.)&lt;/p>
&lt;p>Here are some ways I sped things up.&lt;/p>
&lt;ol>
&lt;li>Be economical with memory. When doing exploratory programming, it&amp;rsquo;s
desirable to tag and store everything, but each of those storages
and accesses take time.&lt;/li>
&lt;li>&lt;em>Don&amp;rsquo;t&lt;/em> use actual arrays in the permutation trie. When I did that,
I ran out of memory. I instead opted for a sparse representation
using an &amp;ldquo;a-list&amp;rdquo; (that is, a linked list of &lt;code>(index, value)&lt;/code>
pairs).&lt;/li>
&lt;li>Make the permutation handling fast, like composition, equality
testing, and lexicographic ordering. I was originally using generic
arithmetic and 64-bits to represent each permutation element, and
it degraded speed.&lt;/li>
&lt;li>Use a good priority queue implementation. You&amp;rsquo;ll be pushing and
popping hundreds of millions of elements.&lt;/li>
&lt;li>Do some analysis and compress the permutation trie
representation. Most nodes of the trie will only contain one
value. If that&amp;rsquo;s the case, just store instead the permutation (and
whatever value associated with it) at the shallowest depth. This
will save a lot of time by avoiding a lot of needless (permuted)
recursion.&lt;/li>
&lt;/ol>
&lt;p>If you have other tips for speeding up the algorithm, please email me!&lt;/p>
&lt;h2 id="sample-benchmarks">Sample benchmarks&lt;/h2>
&lt;p>In the following, we only consider the problem of solving the Rubik&amp;rsquo;s
Cube using the 4-list algorithm, assuming a solution length of 20
moves.&lt;/p>
&lt;p>My computer is a ThinkPad 25th Anniversary Edition. It has an Intel
Core i7-7500U processor at 2.70 GHz, but boosting to 3.50 GHz. It has
32 GiB of RAM, but comfortably runs the solver with around 3&amp;ndash;4 GiB.&lt;/p>
&lt;p>The algorithm as implemented is able to check around 190,000 elements
per second.&lt;/p>
&lt;p>Generating the move lists and pre-processing is a relatively fixed
cost. The lists can be generated once, but the preprocessing (i.e.,
composing the scramble with one of the lists) needs to happen each
solve. In my implementation, the initialization cost is consistently 9
seconds.&lt;/p>
&lt;p>After initialization, the search is conducted. The run time varies
wildly, anywhere from seconds to hours.&lt;/p>
&lt;ul>
&lt;li>64 s, 188 billion CPU cycles, 4 GiB of allocation&lt;/li>
&lt;li>165 s, 480 billion CPU cycles, 12 GiB of allocation&lt;/li>
&lt;li>2210 s, 6 trillion CPU cycles, 162 GiB of allocation&lt;/li>
&lt;li>4613 s, 13 trillion CPU cycles, 356 GiB of allocation&lt;/li>
&lt;li>24010 s, 70 trillion CPU cycles, 2 TiB of allocation&lt;/li>
&lt;/ul>
&lt;p>These are randomly sampled Rubik&amp;rsquo;s Cube scrambles, sorted by time.&lt;/p>
&lt;p>In principle, with the current level of optimization, the algorithm
can take as much as 2 months to finish. I&amp;rsquo;m confident that my
implementation can be brought down a factor of 2, less confident it
can be easily brought down a factor of 50&amp;mdash;but it wouldn&amp;rsquo;t surprise
me either way.&lt;/p>
&lt;p>One interesting thing about this algorithm is that it seems to return
very, very quickly if the solution is 10 or fewer moves. Why? I
haven&amp;rsquo;t done a careful analysis, but I believe it is essentially
because the solution will be in $L_2\circ L_1$. The permutations $l_3$
and $l_4$ will be identity, which reduces to the problem of just
finding $s\in L_2\circ L_1$.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>&amp;ldquo;Meet in the middle&amp;rdquo; algorithms are old and well understood. When we
can&amp;rsquo;t brute-force an entire space, we can try splitting it in two and
try to combine them. That&amp;rsquo;s of course the spirit of the 4-List
Algorithm, but the devil is always in the details, and I hope this
blog post showed a lot of disparate facts needed to come together to
realize the algorithm.&lt;/p>
&lt;p>I think the algorithm communicated by Shamir and his colleagues has
been remarkable but forgotten. While better algorithms exist for the
specific task of solving the Rubik&amp;rsquo;s Cube, the generality of the
4-List Algorithm ought not be understated.&lt;/p>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>A. Fiat, S. Moses, A. Shamir, I. Shimshoni and G. Tardos, &amp;ldquo;Planning and learning in permutation groups,&amp;rdquo; 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, NC, USA, 1989, pp. 274&amp;ndash;279, doi: 10.1109/SFCS.1989.63490. (&lt;a href="https://ieeexplore.ieee.org/document/63490">Link&lt;/a>)&lt;/li>
&lt;li>A. Bawden. &amp;ldquo;Shamir&amp;rsquo;s talk really was about how to solve the cube!&amp;rdquo;. Alan Bawden. From the &lt;em>Cube Lovers&lt;/em> mailing list. 27 May 1987. (&lt;a href="http://www.math.rwth-aachen.de/~Martin.Schoenert/Cube-Lovers/Alan_Bawden__Shamir%27s_talk_really_was_about_how_to_solve_the_cube!.html">Link&lt;/a>)&lt;/li>
&lt;/ol>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>&lt;em>The&lt;/em> Rubik&amp;rsquo;s Cube? Why not just &amp;ldquo;Rubik&amp;rsquo;s Cube&amp;rdquo;?!&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>&lt;em>Iterative-deepening depth-first search&lt;/em> (IDDFS) is an interesting hybrid between breadth-first and depth-first search. Breadth-first search (BFS) can find an optimal path to a target, but requires lots of memory to keep track of nodes that have been seen. Depth-first search (DFS) uses almost no memory, but can&amp;rsquo;t guarantee finding the shortest path. IDDFS is an algorithm which tries DFS up to a maximum depth of 1, then of 2, then of 3, etc. until a path to the target is found. While we re-visit nodes in each successive increase in the maximum depth, the savings in memory and the guarantee of finding the shortest path usually make it worth it.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>A heuristic might be something like this. First, suppose we&amp;rsquo;ve built a table which maps every &lt;em>corner&lt;/em> configuration (ignoring edges) to the number of moves needed to solve it. This problem can be brute-forced, as there are &amp;ldquo;only&amp;rdquo; $8!\cdot 3^7=88,179,340$ corner configurations. Suppose we are doing IDDFS to solve a whole Rubik&amp;rsquo;s Cube, and the algorithm is currently at a depth limit of 10. During our DFS (with a limited depth), we arrive at a position at depth 7, and want to decide if we shall continue with it. We can consult our corner configuration table: If we would require more than 3 moves to solve just the corners, then there&amp;rsquo;s no hope in continuing, since we&amp;rsquo;ll exceed our depth limit of 10. So we drop the line of search on this configuration entirely by returning from the depth-7 recursive call empty-handed.&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:4" role="doc-endnote">
&lt;p>The centers are typically seen as immobile, and hence aren&amp;rsquo;t numbered.&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:5" role="doc-endnote">
&lt;p>Shamir, isn&amp;rsquo;t that name familiar? Yes, he&amp;rsquo;s the &amp;lsquo;S&amp;rsquo; from &amp;ldquo;RSA&amp;rdquo;, the encryption algorithm for which he and colleagues &amp;lsquo;R&amp;rsquo; Rivest and &amp;lsquo;A&amp;rsquo; Adleman won a Turing award.&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:6" role="doc-endnote">
&lt;p>Formally, any subset of five elements of $C$ generates the Rubik&amp;rsquo;s Cube group.&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:7" role="doc-endnote">
&lt;p>Rubik&amp;rsquo;s Cube enthusiasts have names for these concepts. If we measure the length of a move sequence by the number of quarter turns, we say we are measuring in the &lt;em>quarter-turn metric&lt;/em> or &lt;em>QTM&lt;/em>. If instead we are measuring the length of a move sequence by the number of face turns of any degree, we say we are measuring in the &lt;em>half-turn metric&lt;/em> or &lt;em>HTM&lt;/em>.&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:8" role="doc-endnote">
&lt;p>Speedsolvers like to write words in last-to-first order, so they can read off the moves as they&amp;rsquo;re applied.&amp;#160;&lt;a href="#fnref:8" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:9" role="doc-endnote">
&lt;p>A note on the phrase &amp;ldquo;print it&amp;rdquo;. We use the term &amp;ldquo;print it&amp;rdquo; to signify that the permutation has been constructed and it may be consumed. We might not literally &lt;em>print it&lt;/em>, and instead &lt;em>emit it&lt;/em> for use. What this means precisely depends on the programming language you&amp;rsquo;re using. In our final algorithm, we&amp;rsquo;ll actually need to explicitly construct generators, so keep that in mind.&amp;#160;&lt;a href="#fnref:9" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:10" role="doc-endnote">
&lt;p>Again, as in the other footnote, we can see &amp;ldquo;walking&amp;rdquo; or &amp;ldquo;printing&amp;rdquo; or &amp;hellip; as again a manifestation of a process of generating something one-by-one.&amp;#160;&lt;a href="#fnref:10" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:11" role="doc-endnote">
&lt;p>Construct generators?! This is the third footnote dedicated to walking/printing/generating, because it&amp;rsquo;s important and sometimes difficult. Making a generator may be utterly trivial in your language (Scheme with &lt;code>call/cc&lt;/code> or Python with &lt;code>yield&lt;/code>), cumbersome (Common Lisp with &lt;code>cl-cont&lt;/code>), or downright annoying. One trick we used when implementing the algorithm in Common Lisp is to keep track of where we are in the permutation trie by a permutation itself. We can always go to the next one if we can find the current one.&amp;#160;&lt;a href="#fnref:11" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>A software engineer's circuitous journey to calculate eigenvalues</title><link>http://www.stylewarning.com/posts/eigenvalues/</link><pubDate>Wed, 10 Aug 2022 00:00:00 +0000</pubDate><guid>http://www.stylewarning.com/posts/eigenvalues/</guid><description>&lt;p>&lt;em>Or, how to calculate the eigenvalues and eigenvectors of a complex matrix using a routine that only works on real matrices.&lt;/em>&lt;/p>
&lt;p>&lt;em>By Robert Smith&lt;/em>&lt;/p>
&lt;div>
&lt;hr>
&lt;h2>Contents&lt;/h2>
&lt;nav id="TableOfContents">
&lt;ol>
&lt;li>&lt;a href="#why">Why?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#complex-numbers-as-matrices">Complex numbers as matrices&lt;/a>&lt;/li>
&lt;li>&lt;a href="#complex-matrices-as-real-matrices">Complex matrices as real matrices&lt;/a>&lt;/li>
&lt;li>&lt;a href="#some-experimentation">Some experimentation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#proving-the-conjecture">Proving the conjecture&lt;/a>&lt;/li>
&lt;li>&lt;a href="#revisiting-the-computation">Revisiting the computation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-final-pseudocode">The final pseudocode&lt;/a>&lt;/li>
&lt;li>&lt;a href="#much-ado-about-nothing">Much ado about nothing?&lt;/a>&lt;/li>
&lt;/ol>
&lt;/nav>
&lt;hr>
&lt;/div>
&lt;p>If we have a complex matrix, how do we calculate its eigenvalues and
eigenvectors using a procedure that can only work on real matrices?
This recounts my journey on solving that problem and where it came
from in the first place.&lt;/p>
&lt;p>&lt;strong>TL;DR:&lt;/strong> If you came here just looking for an algorithm, scroll to
the bottom for a pseudocode listing.&lt;/p>
&lt;h2 id="why">Why?&lt;/h2>
&lt;p>&lt;a href="https://github.com/quil-lang/magicl">MAGICL&lt;/a> is a Common Lisp library
for doing matrix arithmetic. To make a long story short, there&amp;rsquo;s a
desire to reduce MAGICL&amp;rsquo;s dependence on foreign libraries (e.g.,
LAPACK), and instead use pure Common Lisp routines. Except,
implementing numerical linear algebra is difficult, and the MAGICL
maintainers usually have more important things to work on. So, instead
of writing routines from scratch via textbooks, we sometimes resort to
mechanically translating an old distribution of LAPACK, written in
FORTRAN 77, into Common Lisp. Due to the age of the routines, I
personally think it&amp;rsquo;s prudent to minimize its usage.&lt;/p>
&lt;p>One routine that&amp;rsquo;s generally reliable&amp;mdash;as both FORTRAN 77 code as
well as its mechanically translated Common Lisp counterpart&amp;mdash;is
&lt;code>DGEEV&lt;/code>, a LAPACK routine to compute eigenvalues and eigenvectors of a
general real-matrix of double-precision floats. (We&amp;rsquo;ll call a set of
eigenvectors and the corresponding eigenvalues the &lt;strong>eigensystem&lt;/strong>.)
This routine is made nice in MAGICL and exposed as the Lisp function
&lt;code>MAGICL:EIG&lt;/code>.&lt;/p>
&lt;p>The &lt;code>MAGICL:EIG&lt;/code> function, however, is required to be able to work
with both real and complex matrices, yet &lt;code>DGEEV&lt;/code> only works with
reals. So, we are left with two reasonable options:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Get mechanical translation of complex-matrix BLAS and LAPACK
functions working so that we can call the complex-matrix counterpart
of &lt;code>DGEEV&lt;/code> called &lt;code>ZGEEV&lt;/code>, or&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Figure out how to only use &lt;code>DGEEV&lt;/code> to somehow compute the
eigensystem of a complex matrix.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Because I like puzzles, and because I&amp;rsquo;m loathe to add to the existing
virtually unmaintainable pile of 50,000 lines of mechanically
translated code, I opted for the latter option.&lt;/p>
&lt;h2 id="complex-numbers-as-matrices">Complex numbers as matrices&lt;/h2>
&lt;p>Complex numbers look like pairs of real numbers that have funny rules
for addition and multiplication. More precisely, complex numbers form
a two-dimensional real vector space, and multiplication is in fact an
$\mathbb{R}$-linear map. As such, if we specify a basis, we can write
a matrix. Consider the $\mathbb{R}$-basis $\hat e_0:=1$ and $\hat
e_1:=i$ as well as the multiplication map $z\mapsto (a+bi)z$. Applying
this map to our basis gives&lt;/p>
&lt;p>$$
\begin{align*}
(a+bi)\hat e_0 &amp;amp;= (a+bi)\cdot 1 &amp;amp; (a+bi)\hat e_1 &amp;amp;= (a+bi)\cdot i \\
&amp;amp;= a+bi &amp;amp; &amp;amp;= -b+ai \\
&amp;amp;= a\hat e_0+b\hat e_1 &amp;amp; &amp;amp;= -b\hat e_0 + a\hat e_1
\end{align*}
$$&lt;/p>
&lt;p>We thus immediately conclude&lt;/p>
&lt;p>$$
\begin{pmatrix}
\hat e_0 &amp;amp; \hat e_1
\end{pmatrix}
(a+bi) =
\begin{pmatrix}
a &amp;amp; b \\
-b &amp;amp; a
\end{pmatrix}
\begin{pmatrix}
\hat e_0 \\
\hat e_1
\end{pmatrix}.
$$&lt;/p>
&lt;p>This is to say that the multiplication map $z\mapsto (a+bi)z$ can be
represented by the matrix&lt;/p>
&lt;p>$$
\begin{pmatrix}
a &amp;amp; b \\
-b &amp;amp; a
\end{pmatrix}.
$$&lt;/p>
&lt;p>(This representation is not unique. We could also exchange the $b$ and
$-b$ for a different representation, as many texts do. This is because
complex conjugation&amp;mdash;which this exchange of off-diagonal elements
represents&amp;mdash;is an isomorphism on $\mathbb{C}$.)&lt;/p>
&lt;p>One can verify that this matrix can be both added and multiplied, and
it works as expected.&lt;/p>
&lt;h2 id="complex-matrices-as-real-matrices">Complex matrices as real matrices&lt;/h2>
&lt;p>We have given a way to identify elements of $\mathbb{C}$ as elements
of $\mathbb{R}^{2\times 2}$, the set of $2\times 2$ real
matrices. This readily gives us a representation for
$\mathbb{C}^{n\times n}$ matrices: If we have a matrix
$U\in\mathbb{C}^{n\times n}$, produce a matrix
$V\in\mathbb{R}^{2n\times 2n}$ by replacing each $U_{r,c}$ with our
real-matrix representation:&lt;/p>
&lt;p>$$
\begin{pmatrix}
V_{2r,2c} &amp;amp; V_{2r,2c+1} \\
V_{2r+1,2c} &amp;amp; V_{2r+1,2c+1}
\end{pmatrix} :=
\begin{pmatrix}
\Re U_{r,c} &amp;amp; \Im U_{r,c} \\
-\Im U_{r,c} &amp;amp; \Re U_{r,c}
\end{pmatrix}.
$$&lt;/p>
&lt;p>For example, this is a transformation from an element of
$\mathbb{C}^{2\times 2}$ to an element of $\mathbb{R}^{4\times 4}$.&lt;/p>
&lt;p>$$
\begin{pmatrix}
1+2i &amp;amp; 3-4i \\
5-6i &amp;amp; -7+8i
\end{pmatrix} \mapsto
\begin{pmatrix}
1 &amp;amp; 2 &amp;amp; 3 &amp;amp; -4 \\
-2 &amp;amp; 1 &amp;amp; 4 &amp;amp; 3 \\
5 &amp;amp; -6 &amp;amp; -7 &amp;amp; 8 \\
6 &amp;amp; 5 &amp;amp; -8 &amp;amp; -7
\end{pmatrix}.
$$&lt;/p>
&lt;p>Due to how matrix arithmetic works with block matrices (which these
real matrices essentially are), we at least get ordinary addition and
multiplication in this representation.&lt;/p>
&lt;p>Does this mean we can now just apply any linear algebra routine to
these real matrices and get the same answer as we would for complex
matrices? Unfortunately not, and eigensystems are a perfect example
way. How many eigenvalues (at most) does an operator in a
$d$-dimensional vector space have (over a field of characteristic
zero)? Well, $d$, since for square invertible matrices, the
eigensystem forms a basis in which the matrix at hand diagonalizes. So
just by that, we won&amp;rsquo;t have the same number of eigenvalues, and thus
the results of linear algebra routines&amp;mdash;at least one that computes
eigensystems&amp;mdash;can&amp;rsquo;t be the same.&lt;/p>
&lt;h2 id="some-experimentation">Some experimentation&lt;/h2>
&lt;p>I was experimenting with computing eigensystems of complex matrices
and their corresponding real variants, and noticed a pattern. If the
complex matrix had a real eigenvalue, it would be also show up as an
eigenvalue (of double multiplicity) of the real matrix. If the complex
matrix had a complex eigenvalue, then it would show up as an
eigenvalue of the real matrix, along with its complex conjugate.&lt;/p>
&lt;p>This lead me to the following conjecture:&lt;/p>
&lt;p>&lt;strong>Conjecture&lt;/strong>: Let $U\in\mathbb{C}^{n\times n}$, and let
$V\in\mathbb{R}^{2n\times 2n}$ be the real matrix corresponding to $U$
according to the aforementioned transformation. If $a+bi$ is an
eigenvalue of $U$, then $a\pm bi$ are two eigenvalues of $V$.&lt;/p>
&lt;p>With this conjecture and my chin up, I could implement a routine to
compute eigenvalues of $U$ using just a real eigenvalue algorithm. The
way I did it was to write a procedure to find the true conjugates
amongst the complete set. The way to do this is roughly as follows.&lt;/p>
&lt;p>First, let $E$ be our multiset of eigenvalues of $V$ (the &lt;em>real&lt;/em>
matrix), but delete duplicate real values (they&amp;rsquo;ll show up in pairs),
and delete complex eigenvalues that have a negative imaginary part
(there will always be a corresponding conjugate).&lt;/p>
&lt;p>Second, recall that $\operatorname{Tr} U$ is the sum of the
eigenvalues of $U$. This will be a complex number whose real part is
simply recovered by summing the real parts of the eigenvalues:&lt;/p>
&lt;p>$$
\Re (\operatorname{Tr} U) = \sum_{e\in E} \Re e.
$$&lt;/p>
&lt;p>This fact isn&amp;rsquo;t computationally useless; it can be verified as a
sanity check in code immediately because there is no ambiguity in the
real parts, if the conjecture is true.&lt;/p>
&lt;p>The imaginary part is a little trickier, since there is ambiguity
stemming from uncertainty around which conjugate is actually an
eigenvalue of $U$. As such, there must be a sequence of $\vert E\vert$
signs $s_{\bullet}\in\{-1,+1\}$ such that&lt;/p>
&lt;p>$$
\Im (\operatorname{Tr} U) = \sum_{k=0}^{\vert E\vert-1} s_k \Im e_k.
$$&lt;/p>
&lt;p>(The ordering of $e_{\bullet}$ doesn&amp;rsquo;t matter; any will do.)&lt;/p>
&lt;p>Though asymptotically inefficient, the values for $s_k$ can be solved by
brute force: keep trying until you find the set that works. As it
turns out, there&amp;rsquo;s not a &lt;em>lot&lt;/em> better you can do, since the solution
to this sequence-of-signs problem can be reduced to solving the
subset-sum problem.&lt;/p>
&lt;p>While perhaps a neat conjecture, this wasn&amp;rsquo;t very satisfying to me. I
simultaneously hadn&amp;rsquo;t fully solved what I set out to solve (finding
the eigen&lt;em>system&lt;/em>, not just the eigen&lt;em>values&lt;/em>), and I had a nagging
conjecture that exposed my lack of knowledge about the problem.&lt;/p>
&lt;h2 id="proving-the-conjecture">Proving the conjecture&lt;/h2>
&lt;p>Ultimately, I had to go back to basics, ask my friends and family for
help, and try to break through. I figured that while the conjecture
wasn&amp;rsquo;t ultimately very computationally useful on its own, if I could
prove it, maybe it would give me enough insight mathematically and
computationally to come up with something better.&lt;/p>
&lt;p>After some trial and error, I settled on trying to use the following
fact.&lt;/p>
&lt;p>&lt;strong>Fact&lt;/strong>: Let $A'$ and $A''$ be square matrices, and let $A :=
A'\oplus A''$. Then the eigenvalues of $A$ will be the union of the
eigenvalues of each $A'$ and $A''$.&lt;/p>
&lt;p>This is readily seen by computing the characteristic polynomial. Let
$\mathbb{C}[\lambda]$ be our polynomial ring. Then the characteristic
polynomial of $A$ equals the product of the polynomials of $A'$ and
$A''$:&lt;/p>
&lt;p>$$
\det (A - \lambda I_{\dim A}) =
\det (A' - \lambda I_{\dim A'}) \det (A'' - \lambda I_{\dim A''})
$$&lt;/p>
&lt;p>This ended up being a crucial insight, as we&amp;rsquo;ll see.&lt;/p>
&lt;p>Another fact I needed was the following. As a matter of notation, let
$\bar A$ denote the complex conjugate of each entry of $A$.&lt;/p>
&lt;p>&lt;strong>Fact&lt;/strong>: Let $A$ be a complex matrix. If $a+bi$ is an eigenvalue of
$A$, then its conjugate $a-bi$ is an eigenvalue of $\bar A$.&lt;/p>
&lt;p>This is seen by, again, looking at the characteristic polynomial and
using the properties of complex conjugation:&lt;/p>
&lt;p>$$
\begin{align*}
\det (\bar A - \lambda I_{\dim A})
&amp;amp;= \det (\bar A - \overline{\bar\lambda I_{\dim A}}) \\
&amp;amp;= \det (\overline{A - \bar\lambda I_{\dim A}}) \\
&amp;amp;= \overline{\det (A - \bar\lambda I_{\dim A})}.
\end{align*}
$$&lt;/p>
&lt;p>These facts were enough for me to refine the conjecture:&lt;/p>
&lt;p>&lt;strong>Conjecture (redux)&lt;/strong>: Let $U\in\mathbb{C}^{n\times n}$, and let
$V\in\mathbb{R}^{2n\times 2n}$ be the real matrix corresponding to $U$
according to the aforementioned transformation. Then $V$ is similar to
$U\oplus \bar U$ when $V$ is trivially interpreted as a real matrix in
$\mathbb{C}^{2n\times 2n}$.&lt;/p>
&lt;p>This new conjecture is equivalent to the old one by way of those two
facts.&lt;/p>
&lt;p>Now, things started to look good. If I could find a similarity
transform in $\mathbb{C}^{2n\times 2n}$ that block-diagonalizes $V$,
and show that such a diagonalization is exactly $U\oplus \bar U$, then
I&amp;rsquo;d be golden.&lt;/p>
&lt;p>Since we are &amp;ldquo;allowed&amp;rdquo; to work over $\mathbb{C}$, the first step was
to actually &lt;em>undo&lt;/em> the complex-to-real transformation. However, since
we are building a similarity transform, we need it to be invertible.&lt;/p>
&lt;p>Again, after trial and error, I found that&lt;/p>
&lt;p>$$
\left[
\frac{1}{\sqrt{2}}
\begin{pmatrix}
1 &amp;amp; -i \\
1 &amp;amp; i \\
\end{pmatrix}
\right]
\begin{pmatrix}
a &amp;amp; b \\
-b &amp;amp; a
\end{pmatrix}
\left[
\frac{1}{\sqrt{2}}
\begin{pmatrix}
1 &amp;amp; -i \\
1 &amp;amp; i \\
\end{pmatrix}
\right]^{-1} =
\begin{pmatrix}
a+bi &amp;amp; 0 \\
0 &amp;amp; a-bi
\end{pmatrix}.
$$&lt;/p>
&lt;p>Only until I constructed this matrix, let&amp;rsquo;s call it&lt;/p>
&lt;p>$$
K :=
\frac{1}{\sqrt{2}}
\begin{pmatrix}
1 &amp;amp; -i \\
1 &amp;amp; i \\
\end{pmatrix},
$$&lt;/p>
&lt;p>did I have a big &amp;ldquo;aha&amp;rdquo; moment. I was hitherto so focused on the
(wrong) idea that our complex-to-real transformation was unique or
canonical. I &amp;ldquo;knew&amp;rdquo; that we could choose either position of $b$ or
$-b$ to represent either a number or its conjugate, but I didn&amp;rsquo;t think
deeply enough about the repercussions of that fact. With $K$, it was
apparent that our real matrix actually, in some sense, holds &lt;em>both&lt;/em> a
complex number &lt;em>and&lt;/em> its conjugate.&lt;/p>
&lt;p>&lt;em>At this point, it was obvious what to do. This was the critical
insight.&lt;/em>&lt;/p>
&lt;p>Our matrix $K$ just works for $2\times 2$ matrices in
$\mathbb{R}^{2\times 2}\subset \mathbb{C}^{2\times 2}$. We can
extend it by using this little rule of linear algebra. If&lt;/p>
&lt;p>$$
X := \begin{pmatrix}
X_{0,0} &amp;amp; X_{0,1} &amp;amp; \cdots &amp;amp; X_{0,c-1} \\
X_{1,0} &amp;amp; X_{1,1} &amp;amp; &amp;amp; \\
\vdots &amp;amp; &amp;amp; \ddots &amp;amp; \vdots \\
X_{r-1,0} &amp;amp; &amp;amp; \cdots &amp;amp; X_{r-1,c-1}
\end{pmatrix}
$$&lt;/p>
&lt;p>is a &lt;em>block&lt;/em> matrix and $D := \Delta\oplus\cdots\oplus \Delta$
is a block diagonal matrix with $X_{\bullet}$ and $\Delta$ square and
having the same shape, then&lt;/p>
&lt;p>$$
(DX)_{r,c} = \Delta X_{r,c} \qquad\text{and}\qquad (XD)_{r,c} = X_{r,c}\Delta,
$$&lt;/p>
&lt;p>i.e., multiplication of these matrices results in $\Delta$ getting
&amp;ldquo;applied&amp;rdquo; to each block. As such,&lt;/p>
&lt;p>$$
K^{\oplus n} V (K^{\oplus n})^{-1}
$$&lt;/p>
&lt;p>will be a block matrix equivalent to substituting each disjoint
$2\times 2$ sub-matrix of $U$ with the matrix like
$\operatorname{diag}(z,\bar z)$ where $z$ is calculated as described.&lt;/p>
&lt;p>We&amp;rsquo;re still not where we&amp;rsquo;re at. This transformed matrix will be a
checkerboard pattern of $U$-likes on even-even- and odd-odd-indexed
entries, and zeros on even-odd- and odd-even-indexed positions. The
last necessary bit then to finish our similarity transform is to
permute this matrix in such a way that all positive-signed conjugates
are in the top-left $n\times n$ sub-matrix, and all negative-signed
conjugates are in the bottom-right $n\times n$ sub-matrix. If we take
for granted that permutations are invertible, then we&amp;rsquo;re done proving
it. If we want to construct something, then we observe that all
positive-signed conjugates have even indexes, and all negative-signed
conjugates have odd indexes, then we define the invertible map&lt;/p>
&lt;p>$$
(\Pi X)_{r,c} :=
\begin{cases}
X_{2r,2c} &amp;amp; \text{if }0\leq r,c &amp;lt; n\\
X_{2r+1, 2c+1} &amp;amp; \text{if }n\le r,c &amp;lt; 2n\\
X_{2r, 2c+1} &amp;amp; \text{if }0\leq r &amp;lt; n\land n \leq c &amp;lt; 2n\\
X_{2r+1, 2c} &amp;amp; \text{if }n\leq r &amp;lt; 2n\land 0 &amp;lt; c \leq n
\end{cases}
$$&lt;/p>
&lt;p>We can recover the matrix for $\Pi$ by applying it to the identity
matrix.&lt;/p>
&lt;p>And with that, we have a similary transform:&lt;/p>
&lt;p>$$
(\Pi K) V (\Pi K)^{-1} = U\oplus \bar U.
$$&lt;/p>
&lt;p>Since eigenvalues are preserved under similarity, we&amp;rsquo;ve proved the
conjecture.&lt;/p>
&lt;h2 id="revisiting-the-computation">Revisiting the computation&lt;/h2>
&lt;p>We have proved the conjecture, but does that actually get us any
further in our quest to compute the eigensystem of a complex matrix
using an algorithm for real matrices?&lt;/p>
&lt;p>It does; we now know that the eigenvalues of $V$ are completely
contained in the set for $U$. One thing we haven&amp;rsquo;t addressed, however,
are the eigenvectors.&lt;/p>
&lt;p>If we return to thinking about direct sums, then the eigenvectors of a
matrix $A := A'\oplus A''$ are going to be eigenvectors of $A'$ and
$A''$ &amp;ldquo;lifted&amp;rdquo; to the larger sum of spaces. In other words, if $x$ is
an eigenvector of $A'$, then $x\oplus \vec 0_{\dim A''}$ is an
eigenvector of $A$, where $\vec 0$ denotes a vector of zeros (i.e.,
$x$ is padded with zeros).&lt;/p>
&lt;p>As such, in our block-diagonal basis, the eigenvectors of $V$ are
related to the eigenvectors of $U$ in the following way. Suppose
$(\lambda, x)$ is an eigenvalue-eigenvector pair of $U$. Then $Ux =
\lambda x$. This directly implies that $\bar U \bar x =
\bar\lambda\bar x$. Since $V\sim U\oplus\bar U$, $x\oplus\vec 0_{n}$
and $\vec 0_n\oplus\bar x$ are eigenvectors of $V$.&lt;/p>
&lt;p>All that&amp;rsquo;s left to determine is: Which eigenvector is the right one
without doing a costly similarity transform?&lt;/p>
&lt;p>To do this, we &amp;ldquo;disembed&amp;rdquo; the eigenvector from the vector space of
$V\sim U\oplus\bar U$ into the vector space of $U$ in such a way that
the $\bar U$ subspace collapses to zero. We can do this easily. Our
eigenvectors of $V$ without transformation are going to look like&lt;/p>
&lt;p>$$
\begin{pmatrix}
a+bi \\
-b+ai \\
c+di \\
-d+ci \\
\vdots
\end{pmatrix}
\qquad
\text{and}
\qquad
\begin{pmatrix}
a-bi \\
-b-ai \\
c-di \\
-d-ci \\
\vdots
\end{pmatrix},
$$&lt;/p>
&lt;p>where these correspond to eigenvalues $\lambda$ and $\bar\lambda$
respectively. One can see this by way of two facts:&lt;/p>
&lt;ul>
&lt;li>The second vector is the entry-wise conjugate of the first vector,
directly suggesting they&amp;rsquo;re each drawn from either of the
eigenvector sets of $U$ or $\bar U$, and&lt;/li>
&lt;li>each $2\times 1$ pair of entries in each vector corresponds to our
basis vectors $(\hat e_{2k}, \hat e_{2k+1}) = (1,i)$ of our ambient
vector space.&lt;/li>
&lt;/ul>
&lt;p>Also, notice the resemblance between pairs of entries in our first (&amp;ldquo;true&amp;rdquo;)
eigenvector, and our complex number representation:&lt;/p>
&lt;p>$$
\begin{pmatrix}
a &amp;amp; b\\
-b &amp;amp; a
\end{pmatrix}
\qquad
\text{and}
\qquad
\begin{pmatrix}
a+bi \\
-b+ai
\end{pmatrix}.
$$&lt;/p>
&lt;p>Taking either vector, we wish to annihilate the &amp;ldquo;wrong&amp;rdquo; one and send
the &amp;ldquo;right&amp;rdquo; one to the space of $U$. Call either eigenvector
$x\in\mathbb{C}^{2n}$ and the resulting vector
$y\in\mathbb{C}^n$. Consider the map&lt;/p>
&lt;p>$$
y_k = \frac{x_{2k} - ix_{2k+1}}{2}
$$&lt;/p>
&lt;p>for integers $0\le k &amp;lt; n$. (It is not actually necessary to divide by
$2$, since if $y$ is an eigenvector, then so is $2y$.) With this map,
the eigenvector $y$ of the conjugate ($\bar U$) space will vanish, or
it will map to, for example, $a+bi$ in the ordinary ($U$) space, as
desired. In the latter case, $y$ will be an eigenvector of $V$.&lt;/p>
&lt;p>If $y = 0$, then it is discarded along with its corresponding
eigenvalue, otherwise, the eigenvector and eigenvalue are kept, and we
are done.&lt;/p>
&lt;h2 id="the-final-pseudocode">The final pseudocode&lt;/h2>
&lt;p>In summary, our algorithm to compute the eigensystem of a complex
matrix is as follows.&lt;/p>
&lt;pre tabindex="0">&lt;code>INPUT:
n : an integer, the dimension of the problem
U : an n x n matrix of complex numbers
OUTPUT:
Lambda : a list of complex numbers, eigenvalues of U
Y : a list of complex n-vectors, eigenvectors of U
Step 1:
V : a 2n x 2n matrix of real numbers
Let V = a block matrix constructed by
expanding each element a+bi of U
into a matrix [a, b; -b, a]
Step 2:
Mu : a list of complex numbers
X : a list of complex vectors of dimension n
Let Mu, X = eigenvalues and eigenvectors of V
using a program to compute eigenvalues
of real numbers
Step 3:
Initialize Lambda = [] and Y = []
For mu, x in Mu, X:
y : a complex n-vector
For k from 0 to n-1:
Let y[k] = x[2*k] - i*x[2*k+1]
If y is a non-zero vector:
Push mu onto Mu
Push y onto Y
&lt;/code>&lt;/pre>&lt;h2 id="much-ado-about-nothing">Much ado about nothing?&lt;/h2>
&lt;p>While this was an interesting puzzle, and &lt;a href="https://github.com/quil-lang/magicl/blob/f5462dd60ec6b513616f4d2a64cf4c978a92cc19/src/high-level/matrix-functions/eig/eig.lisp#L133">leads to working
code&lt;/a>,
was it all worth it? Honestly, I&amp;rsquo;m not sure it&amp;rsquo;s the best engineering
decision. What happens when we need singular-value decomposition, or
some other goofy matrix algorithm? It&amp;rsquo;s difficult to imagine the
complex embedding trick will work well.&lt;/p>
&lt;p>On the other hand, it saves us from using more antiquated FORTRAN 77
code than we need to. :)&lt;/p>
&lt;hr>
&lt;p>&lt;em>Thanks to Juan Bello-Rivas, Erik Davis, Bryan Fong, Brendan
Pawlowski, and Eric Peterson for insightful discussions.&lt;/em>&lt;/p>
&lt;hr></description></item><item><title>Le blog est mort, vive le blog!</title><link>http://www.stylewarning.com/posts/first/</link><pubDate>Thu, 04 Aug 2022 22:16:21 -0700</pubDate><guid>http://www.stylewarning.com/posts/first/</guid><description>&lt;p>&lt;em>By Robert Smith&lt;/em>&lt;/p>
&lt;p>Between 2010 and 2014, I ran a &lt;a href="http://web.archive.org/web/20140711171817/http://symbo1ics.com/blog/?p=8">WordPress
blog&lt;/a>. One
day, the database was accidentally deleted, and I&amp;rsquo;ve since been too
lazy to set something new up. But, as of late, two things have been
happening:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>I&amp;rsquo;ve been spending more time writing longer comments on r/lisp,
Hacker News, etc. While I might feel proud of having written a good
quality comment, I know that it&amp;rsquo;ll disappear into Internet history
just a few days later, never to be read again, even by me.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>I&amp;rsquo;ve been writing articles in LaTeX that feel awfully a lot like
long-form blog posts. Except I never really had an avenue to publish
them, and even if I did, very few people want to read informal PDFs.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Additionally, I&amp;rsquo;ve wanted to write about my piano journey. I&amp;rsquo;ve kept a
personal and private video diary, but I find that the video diary
entries are not very thoughtful and always quite rushed.&lt;/p>
&lt;p>So, while I know personal blogs are no longer in fashion, I hope this
marks a new beginning to my own personal writing journey!&lt;/p></description></item></channel></rss>