A tutorial quantum interpreter in 150 lines of Lisp

July 16, 2023

By Robert Smith

Simulating a universal, gate-based quantum computer on a classical computer has many uses and benefits. The top benefit is the ability to inspect the amplitudes of the system’s state directly. However, while the mathematics is very well understood, implementing a general-purpose simulator has largely been folk knowledge. In this tutorial, we show how to build an interpreter for a general-purpose quantum programming language called $\mathscr{L}$, capable of executing most kinds of quantum circuits found in literature. It is presented economically, allowing its implementation to take fewer than 150 lines of self-contained Common Lisp code. The language $\mathscr{L}$ is very simple to extend, making the interpreter ripe for testing different kinds of behavior, such as noise models.

Introduction

Simulating the workings of an ideal quantum computer has many important applications, such as algorithms research and quantum program debugging. A variety of quantum computer simulators exist, both free and commercial. However, while the concept of the simulation of quantum computers is generally well understood at a high level, the devil is in the details when it comes to implementation.

Quantum computer simulators found in the wild often have many limitations. The most prevalent limitation is the number of qubits an operator can act on. Usually, one-qubit gates and controlled one-qubit¹ gates are allowed, but nothing more. While these together are sufficient for universal quantum computation, it leaves much to be desired when studying quantum algorithms.

In this post, we present an implementation of a fully general quantum programming language interpreter, allowing measurement as well as arbitrary unitary operators on an arbitrary number of arbitrarily indexed qubits. The implementation weighs in at under 150 lines² of code in Common Lisp, though the ideas make implementation simple in other languages as well. All of the code from this tutorial can be found on GitHub.

This tutorial is aimed at a quantum computing beginner who has some familiarity with the fundamentals of linear algebra and computer programming. Beyond those subjects, this tutorial is relatively self-contained. We also aim this tutorial at practitioners of quantum computing, who are interested in the brass tacks of simulation, with all of the details filled out. To such practitioners, the bulk of this document will be easy to skim, since we recapitulate topics such as qubits and unitary operators.

A note about Common Lisp

We use Common Lisp, because it is an excellent platform for both exploratory and high-performance computing. One of the fastest and most flexible quantum simulators out there, the Quantum Virtual Machine, is written entirely in Common Lisp.

We wrote this article so that it would be easy to follow along with a Common Lisp implementation. The code has no dependencies, and should work in any ANSI-compliant implementation (I hope).

With that said, this article was also written with portability in mind. Since no especially Lisp-like features are used, the code should be easy to port to Python or even C. At minimum, your language should support complex numbers and arrays.

A note to experienced quantum computing practitioners

This section is written for experienced practitioners of quantum computing who happened upon this post, and can be skipped.

In this post, we opt to simulate a quantum circuit the “Schrodinger” way, that is, by evolving a wavefunction explicitly. For a circuit of width $n$, we walk through the mathematics of how to interpret a $k$-qubit gate $g \in \mathsf{SU}(2^k)$ for $k\le n$, specified to act on a $k$-tuple of numbered qubits—corresponding to each qubit’s position in the tensor product which forms the Hilbert space of the system—as a full operator $g'\in\mathsf{SU}(2^n)$. We do this by providing an explicit construction of the matrix in the computational basis of the system.

An alternative approach would have been to describe the action of a $g$ on an $n$-qubit wavefunction by way of careful manipulation of indexes, i.e., to effectively permute and partition our wavefunction into $2^{n-k}$ groups of $2^k$-dimensional vectors corresponding to the subsystem of qubits being operated on. The major benefit of this approach is efficiency.

As a first introduction to a computer science graduate, I find this explanation lacking in two ways:

It under-emphasizes that a gate like $\mathsf{CNOT}$, typically written as a $4\times 4$ matrix $\mathsf{I}\oplus\mathsf{X}$, in a quantum circuit truly is a linear operator on the Hilbert space of the entire system. “It’s just linear algebra; here’s the matrix and here’s the vector” is a point I want to drive home.
It requires significant labor to both explain and prove the correctness of the method, without having significant experience with tensor algebra, contractions, Einstein notation, and so on.

The approach of this post can be used as a basis to follow up with more efficient techniques, without relinquishing a strong mathematical foundation. We are very careful to not be hand-wavy, and to not conflate the different vector spaces at play. We hope that you’ll find this approach agreeable, even if it sacrifices some efficiency.

The Language $\mathscr{L}$

We wish to construct an interpreter for a small quantum programming language named $\mathscr{L}$. This language supports both of the fundamental operations of a quantum computer: gates and measurements.

A gate is an operation that modifies a quantum state. (What a quantum state is exactly we will delve into later.) Because quantum states are large compared to the physical resources used to construct them, gates represent the “powerful” operations of a quantum computer.

A measurement is an observation and collapse of the quantum state, producing one bit (i.e., $0$ or $1$) of classical information per qubit. Measurements represent the only way in which one can extract information from our simulated quantum computer, and indeed, in most programming models for real quantum computers.

In some sense, one might think of the language $\mathscr{L}$ as the simplest non-trivial quantum programming language. A program in $\mathscr{L}$ is just a sequence of gates and measurements. The syntax is as follows:

Non-Terminal		Defintion
program	:=	`(` instruction* `)`
instruction	:=	`(` `GATE` matrix qubit+ `)`
	\|	`(` `MEASURE` `)`
matrix	:=	a complex matrix `#2A(` … `)`
qubit	:=	a non-negative integer

Spaces and newlines are ignored, except to delimit the tokens of our language.

We borrow Common Lisp’s two-dimensional array syntax for the syntax of matrices. In Common Lisp, the matrix $\left(\begin{smallmatrix}1 & 2\\3 & 4\end{smallmatrix}\right)$ is written #2A((1 2) (3 4)). We also borrow the syntax for complex numbers: $1-2i$ is written #C(1 -2).

An example program might be one to construct and subsequently measure two qubits labeled 2 and 5 in a Bell state configuration:

(
 (GATE #2A((0.70710677 0.70710677) (0.70710677 -0.70710677)) 2)
 (GATE #2A((1 0 0 0) (0 1 0 0) (0 0 0 1) (0 0 1 0))          2 5)
 (MEASURE)
)

We will model the semantics of $\mathscr{L}$ operationally, by way of an abstract machine. The abstract machine for $\mathscr{L}$ is called $M_n$, where $n$ is a positive but fixed³ number of qubits. The state of the machine $M_n$ is the pair $(v, b)$ where $v$ is a quantum state, and $b$ is an $n$-bit measurement register.

The quantum state is an element of the set

$$\{\Vert v\Vert=1\mid v\in\mathbb{C}^{2^n}\}.$$

In other words, $v$ is a unit vector of dimension $2^n$ over the complex numbers. We will discuss this from first principles in the next section.

The measurement register is an element of the set $\{0,1\}^n$, i.e., a sequence of $n$ bits, which we realize as a non-negative integer. The $k$th least-significant bit of this integer represents the last observation of the qubit numbered as $k$. We will discuss this in detail as well.

In Common Lisp, it suffices to create a structure machine which holds these two pieces of state.

(defstruct machine
  quantum-state
  measurement-register)

Typically, the machine is initialized with each classical bit in the measurement register $0$, and each qubit starting in the zero-state. (However, for the purposes of algorithm study or debugging, the machine may be initialized with any valid state.)

The precise way in which the language $\mathscr{L}$ is interpreted on $M_n$ is what we describe in this tutorial. Before that, however, we find it most important to describe what exactly a quantum state is, and how to represent it on a computer.

The Quantum State

Where does one qubit live?

Quantum computers are usually just a collection of interacting computational elements called qubits. A single qubit has two distinguished states: $\ket{0}$ and $\ket{1}$. If the qubit has a name like $q$, then we label the states $\ket{0}_q$ and $\ket{1}_q$.

The funny notation is called Dirac notation or braket notation. It happens to be a convenient notation for doing calculations in quantum mechanics, and we just use it for consistency with other texts. The ket $\ket{\cdots}$, as a physicist would call it, doesn’t add any special significance, except to denote that the quantity is a vector. One can actually put anything inside the brackets. In usual linear algebra, one often writes $\mathbf{e}_i$ to denote a basis vector, where in quantum mechanics, one just writes the subscript in a ket $\ket{i}$, dropping the $\mathbf{e}$ entirely. If the notation throws you off, and you’d like to think in more traditional written linear algebra notation, you can always replace $\ket{x}$ with $\vec x$, and you’ll be safe.

These distinguished states $\ket{0}$ and $\ket{1}$ are understood to be orthonormal basis vectors in a vector space whose scalars are complex numbers $\mathbb{C}$. As such, a qubit can be $\ket{0}$, $\ket{1}$, or a superposition $\alpha\ket 0 + \beta\ket 1$, where $\alpha$ and $\beta$ are complex numbers. The numbers $\alpha$ and $\beta$ are called probability amplitudes, because $\vert\alpha\vert^2$ (resp. $\vert\beta\vert^2$) represent the probability of the qubit being observed in the $\ket 0$ (resp. $\ket 1$) state. Since they represent probabilities, there’s an additional constraint, namely that the probabilities add to one: $\vert\alpha\vert^2 + \vert\beta\vert^2=1$.

To those unfamiliar, it may not be obvious why we’ve opted to use the language of linear algebra. Why do we consider a qubit as being a linear combination? Why do we suppose that the observable states are orthonormal vectors? Why can’t we simply say that a qubit is just a pair of complex numbers and move on?

The reason for this is scientific, and not mathematical. It turns out that the best theory of quantum mechanics we have is one which describes transformations between states as being linear. In fact, the evolution of a quantum mechanical system is not only described by an operation that is just linear, but also reversible. These conditions—linear, reversible, and length-preserving—give rise to a special class of transformations called unitary operators, which naturally lead us to the discussion of vector spaces over complex numbers⁴.

We will discuss the nature of these operations in more depth when we consider how to implement gates later on. For now, however, it’s sufficient to think of a qubit named $q$ as something that lives in a complex, two-dimensional vector space, which we will call $$B_q := \operatorname{span}_{\mathbb{C}}\{\ket 0_q, \ket 1_q\}.$$ (We will use this $B_q$ notation a few times throughout this tutorial. Remember it!) We also understand that this space is equipped⁵ with a way to calculate lengths of vectors—the usual norm

$$ \left\Vert\alpha\ket{0}+\beta\ket{1}\right\Vert = \sqrt{\vert\alpha\vert^2+\vert\beta\vert^2}. $$

Many qubits

Roughly speaking, a single qubit can be described by two probabilities. How do we deal with more?

Suppose we have two qubits named $X$ and $Y$. As a pair, quantum mechanics tells us that they can interact. Practically, what that means is that their states can be correlated in some way. If they’ve interacted, knowing information about $X$ might give us a clue about what $Y$ might be. One well-known example of this is the Bell state, which can be summarized as follows:

Qubit $X$	Qubit $Y$	Prob. Amp.	Probability
$\ket 0_X$	$\ket 0_Y$	$1/\sqrt{2}$	$50\%$
$\ket 0_X$	$\ket 1_Y$	$0$	$0\%$
$\ket 1_X$	$\ket 0_Y$	$0$	$0\%$
$\ket 1_X$	$\ket 1_Y$	$1/\sqrt{2}$	$50\%$

Here, we have an example of a non-factorizable state; qubits $X$ and $Y$ are correlated to each other dependently. If we know $X$ is in the $\ket 0_X$ state, then we necessarily know that $Y$ is in the $\ket 0_Y$ state. Such a correlation means it’s not possible to express the probabilities independently. It might be tempting to think that one can simply think of $X$ having a $50\%$ probability of being in either basis state, and $Y$ having a $50\%$ probability of being in either state—facts which are certainly true—but considering those independently would give us a different distribution of probabilities of the system:

Qubit $X$	Qubit $Y$	Probability
$\ket 0_X$	$\ket 0_Y$	$P(X=\ket 0_X)P(Y=\ket 0_Y)=25\%$
$\ket 0_X$	$\ket 1_Y$	$P(X=\ket 0_X)P(Y=\ket 1_Y)=25\%$
$\ket 1_X$	$\ket 0_Y$	$P(X=\ket 1_X)P(Y=\ket 0_Y)=25\%$
$\ket 1_X$	$\ket 1_Y$	$P(X=\ket 1_X)P(Y=\ket 1_Y)=25\%$

This state is called factorizable because we can express each probability as a product of probabilities pertaining to the original qubits, i.e., each probability has a form that looks like $P(X)P(Y)$. Note that here, knowing something about $X$ gives us no information about $Y$, since they’re completely independent. With that said, it should be emphasized that factorizable states are perfectly valid states, but they don’t represent the entirety of possible states.

If qubits $X$ and $Y$ live in the linear spaces $B_X$ and $B_Y$ respectively, then the composite space is written $B_X\otimes B_Y$. This is called a tensor product, which is a way to combine spaces with the above structure. Formally, if we have an $m$-dimensional vector spaces $V:=\operatorname{span}\{v_1,\ldots,v_m\}$ and an $n$-dimensional vector space $W:=\operatorname{span}\{w_1,\ldots,w_n\}$, then their tensor product $T:=V\otimes W$ will be an $mn$-dimensional vector space $\operatorname{span}\{t_1,\ldots,t_{mn}\}$, where each $t_i$ is a formal combination of basis vectors from $V$ and $W$. (There are of course $mn$ different combinations of $v$’s and $w$’s.) To give an example without all the abstraction, consider $V$ with a basis $\{\vec x, \vec y, \vec z\}$ and $W$ with a basis $\{\vec p, \vec q\}$. Then $V\otimes W$ will have a basis

$$ \left\{ \begin{array}{lll} \vec x\otimes\vec p, & \vec y\otimes\vec p, & \vec z\otimes\vec p, \\ \vec x\otimes\vec q, & \vec y\otimes\vec q, & \vec z\otimes\vec q\hphantom{,} \end{array} \right\}. $$

An example vector in the space $V\otimes W$ might be

$$ -i(\vec x\otimes\vec p) - 2(\vec y\otimes\vec p) + 3 (\vec z\otimes\vec p) + \frac{1}{4}(\vec x\otimes\vec q) - \sqrt{5}(\vec y\otimes\vec q) + e^{6\pi}(\vec z\otimes\vec q), $$

assuming these vector spaces are over $\mathbb{C}$.

Intuitively, a tensor product “just” gives us a way to associate a number with each possible combination of basis vector. In our case, we need to associate a probability amplitude with each combination of distinguished qubit basis states. We need this ability since—as we’ve established—we need to consider every possible holistic outcome of a collection of qubits, as opposed to the outcomes of the qubits independently. (The former constitute both factorizable and non-factorizable states, while the latter only include factorizable states.)

Bit-String notation and a general quantum state

If we have qubits $X$, $Y$, and $Z$, then they’ll live in the space $B_X\otimes B_Y\otimes B_Z$, which we’ll call $Q_3$. It will be massively inconvenient to write the basis vectors as, for example, $\ket 0_X\otimes \ket 1_Y\otimes\ket 1_Z$, so we instead use the shorthand $\ket{011}$ when the space has been defined. This is called bit-string notation. A general element $\ket\psi$ of $Q_3$ can be written $$\psi_0\ket{000}+\psi_1\ket{001}+\psi_2\ket{010}+\psi_3\ket{011}+\psi_4\ket{100}+\psi_5\ket{101}+\psi_6\ket{110}+\psi_7\ket{111}.$$ There are two substantial benefits from using bit-string notation. These benefits are much more thoroughly explained in this paper—which was a precursor to this very blog post.

The first benefit is that the names of the qubits—$X$, $Y$, and $Z$—have been abstracted away. They’re now just positions in a bit-string, and we can canonically name the qubits according to their position. We record positions from the right starting from zero, so $X$ is in position $2$, $Y$ is in position $1$, and $Z$ is in position $0$.

The second benefit is one relevant to how we implement quantum states on a computer. As written, the probability amplitude $\psi_i$ has an index $i$ whose binary expansion matches the bit-string of the basis vector whose scalar component is $\psi_i$. This is no accident. The main outcome of this is that we can use a non-negative integer as a way of specifying a bit-string, which also acts as an index into an array of probability amplitudes. So for instance, the above state can be written further compactly as $$\ket\psi=\sum_{i=0}^7\psi_i\ket i.$$ Here, $\ket i$ refers to the $i$th bit-string in lexicographic (“dictionary”) order, or equivalently, the binary expansion of $i$ as a bit-string.

Since qubits live in a two-dimensional space, then $n$ qubits will live in a $2^n$-dimensional space. With a great deal of work, we’ve come to our most general⁶ representation of an $n$-qubit system: $$\sum_{i=0}^{2^n-1}\psi_i\ket i,$$ where $\vert\psi_i\vert^2$ gives us the probability of observing the bit-string $\ket i$, implying $$\sum_{i=0}^{2^n-1}\vert\psi_i\vert^2=1.$$

On a computer, representing a quantum state for an $n$-qubit system is simple: It’s just an array of $2^n$ complex numbers. An index $i$ into the array represents the probability amplitude $\psi_i$, which is the scalar component of $\ket{i}$. So, for instance, the state $\ket{000}$ in a 3-qubit system is represented by an array whose first element is $1$ and the rest $0$. Here is a function to allocate a new quantum state of $n$ qubits, initialized to be in the $\ket{\ldots 000}$ state:

(defun make-quantum-state (n)
  (let ((s (make-array (expt 2 n) :initial-element 0.0d0)))
    (setf (aref s 0) 1.0d0)
    s))

Sometimes, given a quantum state, or even an operator on a quantum state, we will want to recover how many qubits the state represents, or the operator acts on. In both cases, the question reduces to determining the number of qubits that a dimension represents. Since our dimensions are always powers of two, we need to compute the equivalent of a binary logarithm. In Common Lisp, we can compute this by computing the number of bits an integer takes to represent using integer-length. The number $2^n$ is always a 1 followed by $n$ 0’s, so the length of $2^n$ in binary is $n+1$.

(defun dimension-qubits (d)
  (1- (integer-length d)))

Evolving the quantum state

Since the quantum state is a vector, the principal way we change it is through linear operators represented as matrices. As our quantum program executes, we say that the quantum state evolves. Matrix–vector multiplication is accomplished with apply-operator and matrix–matrix multiplication is accomplished with compose-operators. There is nothing special about these functions; they are the standard textbook algorithms.

(defun apply-operator (matrix column)
  (let* ((matrix-size (array-dimension matrix 0))
         (result (make-array matrix-size :initial-element 0.0d0)))
    (dotimes (i matrix-size)
      (let ((element 0))
        (dotimes (j matrix-size)
          (incf element (* (aref matrix i j) (aref column j))))
        (setf (aref result i) element)))
    (replace column result)))

(defun compose-operators (A B)
  (destructuring-bind (m n) (array-dimensions A)
    (let* ((l (array-dimension B 1))
           (result (make-array (list m l) :initial-element 0)))
      (dotimes (i m result)
        (dotimes (k l)
          (dotimes (j n)
            (incf (aref result i k)
                  (* (aref A i j)
                     (aref B j k)))))))))

These functions will sit at the heart of the interpreter, which will be elaborated upon in the section about gates.

Measurement

Already, through the construction of our quantum state, we’ve discussed the idea that the probability amplitudes imply a probability of observing a state. Measurement then amounts to looking at a quantum state as a discrete probability distribution and sampling from it.

Measurement in quantum mechanics is side-effectful; observation of a quantum state also simultaneously collapses that state. This means that when we measure a state to be a bit-string, then the state will also become that bit-string, zeroing out every other component in the process.

We thus implement the process of measurement in two steps: The sampling of the state followed by its collapse.

(defun observe (machine)
  (let ((b (sample (machine-quantum-state machine))))
    (collapse (machine-quantum-state machine) b)
    (setf (machine-measurement-register machine) b)
    machine))

Note that we’ve recorded our observation into the measurement register. We now proceed to define what we mean by sample and collapse.

How shall we sample? This is a classic problem in computer science. If we have $N$ events $\{0, 1,\ldots,N-1\}$, such that event $e$ has probability $P(e)$, then we can sample as follows. Consider the partial sums defined by the recurrence $S(0)=0$ and $S(k)=S(k-1) + P(k-1)$. If we draw a random number $r$ uniformly from $[0,1)$, then we wish to find the $k$ such that $S(k)\leq r < S(k+1)$. Such a $k$ will be a sampling of our events according to the imposed probability distribution.

We can implement this simply by computing successive partial sums, until our condition is satisfied. In fact, we can be a little bit more resourceful. We can find when $r-S(k+1)<0$, which amounts to successive updates $r\leftarrow r-P(k)$.

With a quantum system, we have $P(\ket i) = \vert\psi_i\vert^2$, and the sampled $k$ is the bit-string $\ket k$ we find.

Let’s do an example. Suppose we have a quantum state

$$ \sqrt{0.2}\ket{00} - \sqrt{0.07}\ket{01} + \sqrt{0.6}\ket{10} + \sqrt{0.13}\ket{11}. $$

Then our discrete probability distribution is:

$$ P(\ket{00}) = 0.2\qquad P(\ket{01}) = 0.07\qquad P(\ket{10}) = 0.6\qquad P(\ket{11}) = 0.13 $$

Next, suppose we draw a random number $r = 0.2436$. We first check if $r < 0.2$. It’s not, so $\ket{00}$ is not our sample. Subtract it from $r$ to get $r = 0.0436$. Next check if $r < 0.07$. Yes, so our sample is $\ket{01}$. Pictorially, this looks like the following:

The implementation is straightforward:

(defun sample (state)
  (let ((r (random 1.0d0)))
    (dotimes (i (length state))
      (decf r (expt (abs (aref state i)) 2))
      (when (minusp r) (return i)))))

Collapsing to $\ket k$ is simply zeroing out the array and setting $\psi_k$ to $1$.

(defun collapse (state basis-element)
  (fill state 0.0d0)
  (setf (aref state basis-element) 1.0d0))

Gates

Gates as matrices

Gates are the meat of most quantum algorithms. They represent the “hard work” a quantum computer does. As previously described, a gate $g$ is a transformation that is linear, invertible, and length-preserving.

Linear: $g(a\ket\psi+b\ket\phi)=ag(\ket\psi)+bg(\ket\phi)$.
Invertible: There is always an operation $h$ that can cancel out the effect of $g$: $h(g(\ket\psi))=g(h(\ket\psi))=\ket\psi$.
Length-Preserving: $\Vert g(\ket\psi)\Vert = \Vert\ket\psi\Vert$.

These ideas are captured by an overarching idea called a linear isometry, which comes from the Greek word isometria, with isos meaning “equal” and metria meaning “measuring”. As with all linear transformations, we can write them out as a matrix with respect to a particular basis. Matrices representing linear isometries are called unitary matrices⁷.

The simplest gate must be identity, a gate which does nothing.

$$ \mathsf{I} := \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} $$

In Common Lisp, this would be defined as

(defparameter +I+ #2A((1 0)
                      (0 1)))

which we will make use of later. Just a notch higher in complexity would be the quantum analog of a Boolean “NOT”. This is called the $\mathsf{X}$ gate:

$$ \mathsf{X} := \begin{pmatrix} 0 & 1\\ 1 & 0 \end{pmatrix}. $$

This has the effect of mapping $\mathsf{X}\ket 0=\ket 1$, which means directly that $\mathsf{X}\ket 1=\ket 0$ and therefore it is its own inverse: $\mathsf{X}\mathsf{X} = \mathsf{I}$ so $\mathsf{X}=\mathsf{X}^{-1}$.

We suggest re-reviewing how one interprets a matrix as an explicit mapping of each element of the basis, as it helps make sense of gates. In this tutorial, gate matrices are always specified in terms of the bit-string basis

$$ \{\ket{\ldots000}, \ket{\ldots001}, \ket{\ldots010}, \ket{\ldots011}, \ldots\}. $$

We again refer the reader to this paper for an in-depth discussion about this basis.

In the rest of this section, the whole goal is to be able to apply gates to our quantum state. There are two cases of pedagogical and operational interest: the one-qubit gate and the many-qubit gate. We will write two functions to accomplish each of these, in order to implement a general function called apply-gate for applying any kind of gate on any collection of qubits for any quantum state.

(defun apply-gate (state U qubits)
  (assert (= (length qubits) (dimension-qubits (array-dimension U 0))))
  (if (= 1 (length qubits))
      (%apply-1Q-gate state U (first qubits))
      (%apply-nQ-gate state U qubits)))

Gates on multi-qubit machines

If we are working with the machine $M_n$, then our space is $2^n$-dimensional, and as such, our matrices would be written out as $2^n\times 2^n$ arrays of numbers. If we can write out such a matrix, then applying it is as simple as a matrix–vector multiplication. For instance, for a $4$-qubit machine, an $\mathsf{X}$ on qubit $0$ would be written

$$ \begin{pmatrix} 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{pmatrix}, $$

which could be readily applied to a $16$-element quantum state vector. It is easy to verify that this will swap the components of $\ket{\ldots 0}$ with the corresponding components of $\ket{\ldots 1}$.

But as should be plainly obvious from the obnoxious amount of paper wasted by writing out this matrix, it would be better if we could simply generate this matrix with just three pieces of information: the gate matrix $g=\left(\begin{smallmatrix}0 & 1\\1 & 0\end{smallmatrix}\right)$, the qubit index $i=0$, and the size of the machine $n=4$. This is a process we will call lifting.

Lifting requires a fundamental tool for constructing operators on spaces that were formed out of tensor products. If we have two finite-dimensional vector spaces $U$ and $V$, and operators $f$ and $g$ on the spaces respectively, then it seems reasonable to consider how $f$ and $g$ transform $U\otimes V$. In some sense, applying $f$ and $g$ “in parallel” on $U\otimes V$ correspond to a new linear operator $h$. If $f$ and $g$ are matrices, then $h$ is defined by a block matrix

$$ \begin{equation} h_{i,j} = f_{i,j} g. \label{eq:kron} \end{equation} $$

More specifically, let $0 \leq i,j < \dim U$. The matrix $h$ will be an array of $\dim U \times \dim U$ copies of $g$, where the entries of the $(i,j)$th blocks are multiplied by the single scalar $f_{i,j}$. This will lead to a matrix with $(\dim U)(\dim V)$ rows and columns, which is exactly the dimension of $U\otimes V$. Incidentally, we write $h$ as $f\otimes g$, and this combination of operators is called the Kronecker product⁸. As code:

(defun kronecker-multiply (A B)
  (destructuring-bind (m n) (array-dimensions A)
    (destructuring-bind (p q) (array-dimensions B)
      (let ((result (make-array (list (* m p) (* n q)))))
        (dotimes (i m result)
          (dotimes (j n)
            (let ((Aij (aref A i j))
                  (y (* i p))
                  (x (* j q)))
              (dotimes (u p)
                (dotimes (v q)
                  (setf (aref result (+ y u) (+ x v))
                        (* Aij (aref B u v))))))))))))

As a matter of terminology, remember that tensor products combine vector spaces, and Kronecker products combine operator matrices.

Single-qubit gates and gates on adjacent qubits

From here, we can very easily lift one-qubit gates to machines with any number of qubits. A gate $g$ on qubit $i$ in an $n$-qubit machine is just $g$ applied to qubit $i$ and the identity $\mathsf{I}$ on all other qubits. Writing this out as a Kronecker product, we have

$$ \begin{equation} \operatorname{lift}(g, i, n) := \underbrace{\mathsf{I} \otimes \mathsf{I} \otimes \cdots}_{n-i-1\text{ factors}} \otimes g \otimes \underbrace{\cdots \otimes \mathsf{I}}_{i\text{ factors}}, \label{eq:liftone} \end{equation} $$

where there are a total of $n$ factors, and $g$ is at positioned $i$ factors from the right.

This concept generalizes to higher-dimensional operators which act on index-adjacent qubits. In other words, if $g$ is a $k$-qubit operator specifically acting on qubits

$$ (i+k-1, i+k-2, \ldots, i+2, i+1, i), $$

then the lifting operator from \eqref{eq:liftone} is much the same:

$$ \begin{equation} \operatorname{lift}(g, i, n) := \underbrace{\mathsf{I} \otimes \mathsf{I} \otimes \cdots}_{n-i-k\text{ factors}} \otimes g \otimes \underbrace{\cdots \otimes \mathsf{I}}_{i\text{ factors}}. \label{eq:liftmany} \end{equation} $$

It must be emphasized one last time: This only works for multi-qubit operators that act on qubits that are index-adjacent. We will get to how to work with non-adjacent qubits shortly, but first we will turn this into code.

For simplicity, we create a way to iterate a Kronecker product multiple times, that is, compute

$$ \underbrace{g\otimes \cdots \otimes g}_{n\text{ factors}}, $$

which is usually simply written $g^{\otimes n}$. We must use care when handling the case when we are “Kronecker exponentiating” by a non-positive number, so that $f\otimes g^{\otimes 0} = f$.

(defun kronecker-expt (U n)
  (cond
    ((< n 1) #2A((1)))
    ((= n 1) U)
    (t (kronecker-multiply (kronecker-expt U (1- n)) U))))

With kroncker-expt, we can write lift following \eqref{eq:liftmany}:

(defun lift (U i n)
  (let ((left  (kronecker-expt +I+ (- n i (dimension-qubits
                                           (array-dimension U 0)))))
        (right (kronecker-expt +I+ i)))
    (kronecker-multiply left (kronecker-multiply U right))))

Multi-qubit gates on non-adjacent qubits

In this section, we assume we are working on a multi-qubit machine $M_n$ with $n\ge 2$.

The general idea

So far, we’ve managed to get away with lifting operators that act on either a single qubit, or a collection of index-adjacent qubits. This has been more-or-less trivial, because we can tack on a series of identity operators by way of Kronecker products to simulate “doing nothing” to the other qubits. However, if we want to apply a multi-qubit gate to a collection of qubits that aren’t index-adjacent, we have to be a little more clever.

The way we accomplish this is by swapping qubits around so that we can move in and out of index-adjacency. In fact, for a given gate acting on a given collection of qubits, we aim to compute an operator $\Pi$ which moves these qubits into index-adjacency, so that we can compute

$$ \begin{equation} \Pi^{-1} \operatorname{lift}(g, 0, n) \Pi. \label{eq:upq} \end{equation} $$

This recipe requires many ingredients, each of which we describe in detail.

Swapping two qubits

To start, we need some way to swap the state of two qubits. We can do this with the $\mathsf{SWAP}$ operator:

$$ \mathsf{SWAP} := \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 \end{pmatrix}. $$

In Common Lisp, we define this in the same way we defined +I+.

(defparameter +SWAP+ #2A((1 0 0 0)
                         (0 0 1 0)
                         (0 1 0 0)
                         (0 0 0 1)))

The $\mathsf{SWAP}$ operator takes two qubits and swaps their state. What does this mean in a system of correlations, where qubit state isn’t strictly compartmentalized (i.e., factorized)? Swapping is equivalent to swapping the component of $\ket{01}$ with the component of $\ket{10}$, which are the only two distinguishable correlations⁹. Still, in a multi-qubit system, we can’t immediately arbitrarily swap two qubits with the tools we’ve developed. What we can do is swap index-adjacent qubits. In particular, we can define the transpositions

$$ \tau_i := \operatorname{lift}(\mathsf{SWAP}, i, n),\qquad \text{with }0\leq i < n - 1. $$

The transposition $\tau_i$ swaps qubit $i$ with qubit $i+1$. This is our first ingredient.

Re-arranging qubits to be index-adjacent

The second ingredient is a way to re-arrange our qubits so that they are index-adjacent. Suppose we have a three-qubit operator $g$ which acts on qubits $(2, 4, 3)$ in a machine of $n=5$ qubits. The space in which the quantum state of $M_5$ lives is

$$ B_4 \otimes B_3 \otimes B_2 \otimes B_1 \otimes B_0, $$

but we need to re-arrange our state vector as if we’ve moved $B_2\to B_0$, $B_4\to B_1$, and $B_3\to B_2$ so that our sub-state sits index-adjacent. In combinatorics, this permutation is written in two-line notation

$$ \begin{pmatrix} 0 & 1 & 2 & 3 & 4\\ 3 & 4 & 0 & 2 & 1 \end{pmatrix}. $$

Here, we’ve made a few arbitrary decisions. First, we’ve decided to re-map a $k$-qubit operator to the $B_{k-1}\otimes\cdots\otimes B_1\otimes B_0$ subspace. Any other index-adjacent subspace would work, but this simplifies the code. Second, we see that $0\mapsto 3$ and $1\mapsto 4$, but it doesn’t matter so much where they map to, as long as $2$, $4$, and $3$ are mapped correctly.

There’s no sense in writing the first line in two-line notation, so we just write the permutation compactly as $34021$. As a quantum operator, we write this as $\Pi_{34021}$.

The question is: How can we write $\Pi_{34021}$ as familiar operators? It is a well-known fact in combinatorics that any permutation can be decomposed into a composition of swaps, and every swap can be decomposed into a series of adjacent transpositions. We leave this as an exercise¹⁰, but we will show the code to our implementation.

We start with a function which takes a permutation written as a list, like (3 4 0 2 1), and converts it to a list of (possibly non-adjacent) transpositions to be applied left-to-right, represented as cons cells ((0 . 3) (1 . 4) (2 . 3)).

(defun permutation-to-transpositions (permutation)
  (let ((swaps nil))
    (dotimes (dest (length permutation) (nreverse swaps))
      (let ((src (elt permutation dest)))
        (loop :while (< src dest) :do
          (setf src (elt permutation src)))
        (cond
          ((< src dest) (push (cons src dest) swaps))
          ((> src dest) (push (cons dest src) swaps)))))))

Next, we convert these transpositions as cons cells to adjacent transposition indexes. This is straightforward. If we are swapping $(a,b)$ with $a<b$, then we transpose $(a, a+1)$, then $(a+1, a+2)$, and so on until $(b-1, b)$, followed by a reversal of each except $(b-1, b)$. We can simply write this chain of adjacent transpositions as $(a, a+1, \ldots, b-1, \ldots, a+1, a)$. In this example, we’d have the transposition indexes (0 1 2 1 0 1 2 3 2 1 2).

(defun transpositions-to-adjacent-transpositions (transpositions)
  (flet ((expand-cons (c)
           (if (= 1 (- (cdr c) (car c)))
               (list (car c))
               (let ((trans (loop :for i :from (car c) :below (cdr c)
                                  :collect i)))
                 (append trans (reverse (butlast trans)))))))
    (mapcan #'expand-cons transpositions)))

These are indexes $i_1, i_2, \ldots$ such that $\Pi = \cdots \tau_{i_2}\tau_{i_1}$

The last ingredient we need is inverting $\Pi$. If we have $\Pi$ represented as a sequence of $\tau$, then we simply reverse the list of $\tau$.

Using transpositions to implement multi-qubit gates

With all of these, we write what is perhaps the most important function of our interpreter.

(defun %apply-nQ-gate (state U qubits)
  (let ((n (dimension-qubits (length state))))
    (labels ((swap (i)
               (lift +swap+ i n))
             (transpositions-to-operator (trans)
               (reduce #'compose-operators trans :key #'swap)))
      (let* ((U01 (lift U 0 n))
             (from-space (append (reverse qubits)
                                 (loop :for i :below n
                                       :when (not (member i qubits))
                                         :collect i)))
             (trans (transpositions-to-adjacent-transpositions
                     (permutation-to-transpositions
                      from-space)))
             (to->from (transpositions-to-operator trans))
             (from->to (transpositions-to-operator (reverse trans)))
             (Upq (compose-operators to->from
                                     (compose-operators U01
                                                        from->to))))
        (apply-operator Upq state)))))

A few quick notes for comprehension:

The value of (swap i) is $\tau_i$ fully lifted.
The one-line zinger that defines transpositions-to-operator takes a list of transposition indexes and converts it into a unitary operator. It does so by doing what’s known in functional programming as a map-reduce, by first mapping $i\mapsto\tau_i$ and reducing by operator composition.
The variable from-space contains the permutation $p$ that encodes the space in which we’d like to act. This permutation is calculated based off of the qubits argument.
The variables from->to and to->from represent $\Pi_p$ and $\Pi^{-1}_p$ respectively.
The variable Upq is our fully lifted operator, exactly by way of \eqref{eq:upq}.

The function %apply-nQ-gate is what allows our interpreter to be so general. Making the interpreter more efficient ultimately is an exercise in making this function more efficient.

The only thing left to do is integrate all of the topics discussed hitherto into an interpreter!

An interpreter

The driver loop

The bulk of the interpreter has been written. We’ve described the semantics of the two instructions of interest: MEASURE and GATE. Now we create the interpreter itself, which is just a driver loop to read and execute these instructions, causing state transitions of our abstract machine. If we see a GATE, we call apply-gate. If we see a MEASURE, we call observe.

(defun run-quantum-program (qprog machine)
  (loop :for (instruction . payload) :in qprog
        :do (ecase instruction
              ((GATE)
               (destructuring-bind (gate &rest qubits) payload
                 (apply-gate (machine-quantum-state machine) gate qubits)))
              ((MEASURE)
               (observe machine)))
        :finally (return machine)))

Efficiency

Performance-focused individuals will have noticed that this interpreter is pretty costly, in many ways. The biggest cost is also unavoidable: The fact that our state grows exponentially with the number of qubits. Real, physical quantum computers avoid this cost, which makes them alluring machines to both study and construct.

However, even with this unavoidable cost, this interpreter has been implemented for ease of understanding and not machine efficiency. Writing a faster interpreter amounts to avoiding the construction of the lifted operator matrices. This can be done with very careful index wrangling and sensitivity to data types and allocation. This is how the high-performance Quantum Virtual Machine is implemented.

Examples

What good is writing an interpreter if we don’t write any programs worth interpreting? Here are a few examples of programs.

Bell state

The Bell state is one which we’ve explored earlier. It is a two-qubit state $$\frac{1}{\sqrt{2}}(\ket {00} + \ket {11}).$$ Here’s a program to generate one, using two new gates, the controlled-not gate $\mathsf{CNOT}$ and the Hadamard gate $\mathsf{H}$.

(defparameter +H+ (make-array '(2 2) :initial-contents (let ((s (/ (sqrt 2))))
                                                         (list (list s s)
                                                               (list s (- s))))))

(defparameter +CNOT+ #2A((1 0 0 0)
                         (0 1 0 0)
                         (0 0 0 1)
                         (0 0 1 0))))

(defun bell (p q)
  `((GATE ,+H+ ,p)
    (GATE ,+CNOT+ ,p ,q)))

Greenberger–Horne–Zeilinger state

The Greenberger–Horne–Zeilinger state, or GHZ state, is a generalization of the Bell state on more than two qubits, namely $$\frac{1}{\sqrt{2}}(\ket{0\ldots 000} + \ket{1\ldots 111}).$$ This is accomplished by executing a chain of controlled-not gates:

(defun ghz (n)
  (cons `(GATE ,+H+ 0)
        (loop :for q :below (1- n)
              :collect `(GATE ,+CNOT+ ,q ,(1+ q)))))

The quantum Fourier transform

The ordinary discrete Fourier transform of a complex vector is a unitary operator, and as such, it can be encoded as a quantum program. We will write a program which computes the Fourier transform of the probability amplitudes of an input quantum state (a time-domain signal), producing a new quantum state whose amplitudes represent components in the frequency domain. This is the central subroutine to Shor’s algorithm, which is a quantum algorithm which factors integers faster than any known classical method.

First, we will need a gate called the controlled-phase gate $\mathsf{CPHASE}(\theta)$:

(defun cphase (angle)
  (make-array '(4 4) :initial-contents `((1 0 0 0)
                                         (0 1 0 0)
                                         (0 0 1 0)
                                         (0 0 0 ,(cis angle)))))

Now, we can generate the quantum Fourier transform recursively.

(defun qft (qubits)
  (labels ((bit-reversal (qubits)
             (let ((n (length qubits)))
               (if (< n 2)
                   nil
                   (loop :repeat (floor n 2)
                         :for qs :in qubits
                         :for qe :in (reverse qubits)
                         :collect `(GATE ,+swap+ ,qs ,qe)))))
           (%qft (qubits)
             (destructuring-bind (q . qs) qubits
               (if (null qs)
                   (list `(GATE ,+H+ ,q))
                   (let ((cR (loop :with n := (1+ (length qs))
                                   :for i :from 1
                                   :for qi :in qs
                                   :for angle := (/ pi (expt 2 (- n i)))
                                   :collect `(GATE ,(cphase angle) ,q ,qi))))
                     (append
                      (qft qs)
                      cR
                      (list `(GATE ,+H+ ,q))))))))
    (append (%qft qubits) (bit-reversal qubits))))

The program for a three-qubit quantum Fourier transform (qft '(0 1 2)) looks like this:

(
  (GATE #2A((0.7071067811865475d0 0.7071067811865475d0) (0.7071067811865475d0 -0.7071067811865475d0)) 2) 
  (GATE #2A((1 0 0 0) (0 1 0 0) (0 0 1 0) (0 0 0 #C(0.0d0 1.0d0))) 1 2) 
  (GATE #2A((0.7071067811865475d0 0.7071067811865475d0) (0.7071067811865475d0 -0.7071067811865475d0)) 1) 
  (GATE #2A((1 0 0 0) (0 0 1 0) (0 1 0 0) (0 0 0 1)) 1 2) 
  (GATE #2A((1 0 0 0) (0 1 0 0) (0 0 1 0) (0 0 0 #C(0.7071067811865476d0 0.7071067811865475d0))) 0 1) 
  (GATE #2A((1 0 0 0) (0 1 0 0) (0 0 1 0) (0 0 0 #C(0.0d0 1.0d0))) 0 2) 
  (GATE #2A((0.7071067811865475d0 0.7071067811865475d0) (0.7071067811865475d0 -0.7071067811865475d0)) 0) 
  (GATE #2A((1 0 0 0) (0 0 1 0) (0 1 0 0) (0 0 0 1)) 0 2) 
)

(Recall that #C(0 1) represents the complex number $i$.)

We can see the quantum Fourier transform in action by computing the Fourier transform of $\ket{000}$. Here is a transcript of this calculation:

CL-USER> (run-quantum-program
          (qft '(0 1 2))
          (make-machine :quantum-state (make-quantum-state 3)
                        :measurement-register 0))
#S(MACHINE
   :QUANTUM-STATE #(#C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0))
   :MEASUREMENT-REGISTER 0)

Indeed, one can verify that the classical Fourier transform of the vector $[1,0,0,0,0,0,0,0]$ is a vector with eight components equal to about $0.35355$.

$ python
Python 2.7.16 (default, May 23 2023, 14:13:27) 
[GCC 8.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.fft.fft([1,0,0,0,0,0,0,0], norm="ortho")
array([0.35355339+0.j, 0.35355339+0.j, 0.35355339+0.j, 0.35355339+0.j,
       0.35355339+0.j, 0.35355339+0.j, 0.35355339+0.j, 0.35355339+0.j])

Example transcript

Here is an example transcript downloading and using this software, using Steel Bank Common Lisp.

$ git clone https://github.com/stylewarning/quantum-interpreter.git
Cloning into 'quantum-interpreter'...
remote: Counting objects: 10, done.
remote: Compressing objects: 100% (10/10), done.
Unpacking objects: 100% (10/10), done.
remote: Total 10 (delta 2), reused 5 (delta 0), pack-reused 0

$ cd quantum-interpreter/

$ sbcl --noinform
* (load "qsim.lisp")
T

* (load "examples.lisp")
T

* (run-quantum-program (bell 0 1)
                       (make-machine :quantum-state (make-quantum-state 2)
                                     :measurement-register 0))
#S(MACHINE
   :QUANTUM-STATE #(0.7071067690849304d0 0.0d0 0.0d0 0.7071067690849304d0)
   :MEASUREMENT-REGISTER 0)

* (run-quantum-program (qft '(0 1 2))
                       (make-machine :quantum-state (make-quantum-state 3)
                                     :measurement-register 0))
#S(MACHINE
   :QUANTUM-STATE #(#C(0.3535533724408484d0 0.0d0) #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0) #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0) #C(0.3535533724408484d0 0.0d0)
                    #C(0.3535533724408484d0 0.0d0) #C(0.3535533724408484d0 0.0d0))
   :MEASUREMENT-REGISTER 0)

* (defun flip-coin ()
    (machine-measurement-register
     (run-quantum-program
      `((GATE ,+H+ 0) (MEASURE))
      (make-machine :quantum-state (make-quantum-state 1)
                    :measurement-register 0))))
FLIP-COIN

* (loop :repeat 10 :collect (flip-coin))
(1 1 0 1 1 0 0 1 0 1)

* (quit)

Source code

The source code in this tutorial are published under the BSD 3-clause license. The complete listing and most up-to-date source code can be found on GitHub.

Ports in other languages

Others have written this quantum interpreter in other languages. Here’s a list of ports that people have shared with me:

Aistis Raulinaitis’s implementation in OCaml
Graham Enos’s implementation in Rust with a write-up
Marco Rubin’s implementation in Python, no dependencies
Francesco Morri’s implementation in Python

A controlled one-qubit gate is a kind of two-qubit gate. ↩︎
It’s actually 124 SLOC, and it has not been “code golfed”. If we wanted to make an ever tinier quantum interpreter, we could—but brevity for its own sake is not the point. ↩︎
With only a little bit of extra work, mostly bookkeeping, we could make $n$ finite but unbounded during the execution of a program by having instead a collection of so-called quantum registers. These would be realized by instead a collection of $v$’s, which are opportunistically combined with entanglement occurs. ↩︎
For that matter, why complex numbers, and not just real-valued probabilities? The reason is that a complex number of unit norm can be written as $e^{i\theta}$, where $\theta$ is called the phase. Phases are a wave-like property, and allow the complex probability amplitudes to interfere. Interference is a known and understood phenomenon of quantum mechanical systems, and in fact is critical to the function of a quantum computer. ↩︎
Spaces with all of these properties, including a way to calculate distances, are called Hilbert spaces. ↩︎
The fact of the matter is that we can actual get more general by having classical probability distributions of these states, which leads one to so-called “density operators”. This is extremely useful when studying imperfect quantum computers which have noisy operations. ↩︎
While we won’t use this fact in our interpreter, even though it would be useful for error checking, it is very easy to check if a matrix is unitary. First, we compute another matrix $h$ which is the conjugate-transpose of $g$. The conjugate-transpose of a matrix is just the transpose of a matrix with each complex entry conjugated. Once we have this matrix, we check that $hg$ is an identity matrix. The matrix $g$ is unitary if and only if $hg=gh=\mathsf{I}$. ↩︎
Unfortunately, the definition \eqref{eq:kron} seems somewhat arbitrary and out of nowhere. Fortunately, there is a much more “first principles” approach to understanding the tensor product and the Kronecker product, starting with how we map a pair of vectors $v\in V$ and $w\in W$ to a vector $v\otimes w\in V\otimes W$. Such approach is much more satisfying to a mathematician, and even essential to understanding the “true nature” of the tensor product, but perhaps less so to a curious implementer. ↩︎
There is no sense in moving $\ket{00}$ or $\ket{11}$ to accomplish a swap-like operation, since we identify each qubits' respective $\ket 0$ identically, and each $\ket 1$ identically. ↩︎
If you’re not particularly keen to figure out the math yourself, you might consult Lemma 14.1 of these lecture notes. You’re also welcome to just take my word for it! ↩︎

stylewarning's screed