A software engineer's circuitous journey to calculate eigenvalues

Or, how to calculate the eigenvalues and eigenvectors of a complex matrix using a routine that only works on real matrices.

By Robert Smith


Contents


If we have a complex matrix, how do we calculate its eigenvalues and eigenvectors using a procedure that can only work on real matrices? This recounts my journey on solving that problem and where it came from in the first place.

TL;DR: If you came here just looking for an algorithm, scroll to the bottom for a pseudocode listing.

Why?

MAGICL is a Common Lisp library for doing matrix arithmetic. To make a long story short, there’s a desire to reduce MAGICL’s dependence on foreign libraries (e.g., LAPACK), and instead use pure Common Lisp routines. Except, implementing numerical linear algebra is difficult, and the MAGICL maintainers usually have more important things to work on. So, instead of writing routines from scratch via textbooks, we sometimes resort to mechanically translating an old distribution of LAPACK, written in FORTRAN 77, into Common Lisp. Due to the age of the routines, I personally think it’s prudent to minimize its usage.

One routine that’s generally reliable—as both FORTRAN 77 code as well as its mechanically translated Common Lisp counterpart—is DGEEV, a LAPACK routine to compute eigenvalues and eigenvectors of a general real-matrix of double-precision floats. (We’ll call a set of eigenvectors and the corresponding eigenvalues the eigensystem.) This routine is made nice in MAGICL and exposed as the Lisp function MAGICL:EIG.

The MAGICL:EIG function, however, is required to be able to work with both real and complex matrices, yet DGEEV only works with reals. So, we are left with two reasonable options:

Because I like puzzles, and because I’m loathe to add to the existing virtually unmaintainable pile of 50,000 lines of mechanically translated code, I opted for the latter option.

Complex numbers as matrices

Complex numbers look like pairs of real numbers that have funny rules for addition and multiplication. More precisely, complex numbers form a two-dimensional real vector space, and multiplication is in fact an $\mathbb{R}$-linear map. As such, if we specify a basis, we can write a matrix. Consider the $\mathbb{R}$-basis $\hat e_0:=1$ and $\hat e_1:=i$ as well as the multiplication map $z\mapsto (a+bi)z$. Applying this map to our basis gives

$$ \begin{align*} (a+bi)\hat e_0 &= (a+bi)\cdot 1 & (a+bi)\hat e_1 &= (a+bi)\cdot i \\ &= a+bi & &= -b+ai \\ &= a\hat e_0+b\hat e_1 & &= -b\hat e_0 + a\hat e_1 \end{align*} $$

We thus immediately conclude

$$ \begin{pmatrix} \hat e_0 & \hat e_1 \end{pmatrix} (a+bi) = \begin{pmatrix} a & b \\ -b & a \end{pmatrix} \begin{pmatrix} \hat e_0 \\ \hat e_1 \end{pmatrix}. $$

This is to say that the multiplication map $z\mapsto (a+bi)z$ can be represented by the matrix

$$ \begin{pmatrix} a & b \\ -b & a \end{pmatrix}. $$

(This representation is not unique. We could also exchange the $b$ and $-b$ for a different representation, as many texts do. This is because complex conjugation—which this exchange of off-diagonal elements represents—is an isomorphism on $\mathbb{C}$.)

One can verify that this matrix can be both added and multiplied, and it works as expected.

Complex matrices as real matrices

We have given a way to identify elements of $\mathbb{C}$ as elements of $\mathbb{R}^{2\times 2}$, the set of $2\times 2$ real matrices. This readily gives us a representation for $\mathbb{C}^{n\times n}$ matrices: If we have a matrix $U\in\mathbb{C}^{n\times n}$, produce a matrix $V\in\mathbb{R}^{2n\times 2n}$ by replacing each $U_{r,c}$ with our real-matrix representation:

$$ \begin{pmatrix} V_{2r,2c} & V_{2r,2c+1} \\ V_{2r+1,2c} & V_{2r+1,2c+1} \end{pmatrix} := \begin{pmatrix} \Re U_{r,c} & \Im U_{r,c} \\ -\Im U_{r,c} & \Re U_{r,c} \end{pmatrix}. $$

For example, this is a transformation from an element of $\mathbb{C}^{2\times 2}$ to an element of $\mathbb{R}^{4\times 4}$.

$$ \begin{pmatrix} 1+2i & 3-4i \\ 5-6i & -7+8i \end{pmatrix} \mapsto \begin{pmatrix} 1 & 2 & 3 & -4 \\ -2 & 1 & 4 & 3 \\ 5 & -6 & -7 & 8 \\ 6 & 5 & -8 & -7 \end{pmatrix}. $$

Due to how matrix arithmetic works with block matrices (which these real matrices essentially are), we at least get ordinary addition and multiplication in this representation.

Does this mean we can now just apply any linear algebra routine to these real matrices and get the same answer as we would for complex matrices? Unfortunately not, and eigensystems are a perfect example way. How many eigenvalues (at most) does an operator in a $d$-dimensional vector space have (over a field of characteristic zero)? Well, $d$, since for square invertible matrices, the eigensystem forms a basis in which the matrix at hand diagonalizes. So just by that, we won’t have the same number of eigenvalues, and thus the results of linear algebra routines—at least one that computes eigensystems—can’t be the same.

Some experimentation

I was experimenting with computing eigensystems of complex matrices and their corresponding real variants, and noticed a pattern. If the complex matrix had a real eigenvalue, it would be also show up as an eigenvalue (of double multiplicity) of the real matrix. If the complex matrix had a complex eigenvalue, then it would show up as an eigenvalue of the real matrix, along with its complex conjugate.

This lead me to the following conjecture:

Conjecture: Let $U\in\mathbb{C}^{n\times n}$, and let $V\in\mathbb{R}^{2n\times 2n}$ be the real matrix corresponding to $U$ according to the aforementioned transformation. If $a+bi$ is an eigenvalue of $U$, then $a\pm bi$ are two eigenvalues of $V$.

With this conjecture and my chin up, I could implement a routine to compute eigenvalues of $U$ using just a real eigenvalue algorithm. The way I did it was to write a procedure to find the true conjugates amongst the complete set. The way to do this is roughly as follows.

First, let $E$ be our multiset of eigenvalues of $V$ (the real matrix), but delete duplicate real values (they’ll show up in pairs), and delete complex eigenvalues that have a negative imaginary part (there will always be a corresponding conjugate).

Second, recall that $\operatorname{Tr} U$ is the sum of the eigenvalues of $U$. This will be a complex number whose real part is simply recovered by summing the real parts of the eigenvalues:

$$ \Re (\operatorname{Tr} U) = \sum_{e\in E} \Re e. $$

This fact isn’t computationally useless; it can be verified as a sanity check in code immediately because there is no ambiguity in the real parts, if the conjecture is true.

The imaginary part is a little trickier, since there is ambiguity stemming from uncertainty around which conjugate is actually an eigenvalue of $U$. As such, there must be a sequence of $\vert E\vert$ signs $s_{\bullet}\in\{-1,+1\}$ such that

$$ \Im (\operatorname{Tr} U) = \sum_{k=0}^{\vert E\vert-1} s_k \Im e_k. $$

(The ordering of $e_{\bullet}$ doesn’t matter; any will do.)

Though asymptotically inefficient, the values for $s_k$ can be solved by brute force: keep trying until you find the set that works. As it turns out, there’s not a lot better you can do, since the solution to this sequence-of-signs problem can be reduced to solving the subset-sum problem.

While perhaps a neat conjecture, this wasn’t very satisfying to me. I simultaneously hadn’t fully solved what I set out to solve (finding the eigensystem, not just the eigenvalues), and I had a nagging conjecture that exposed my lack of knowledge about the problem.

Proving the conjecture

Ultimately, I had to go back to basics, ask my friends and family for help, and try to break through. I figured that while the conjecture wasn’t ultimately very computationally useful on its own, if I could prove it, maybe it would give me enough insight mathematically and computationally to come up with something better.

After some trial and error, I settled on trying to use the following fact.

Fact: Let $A'$ and $A''$ be square matrices, and let $A := A'\oplus A''$. Then the eigenvalues of $A$ will be the union of the eigenvalues of each $A'$ and $A''$.

This is readily seen by computing the characteristic polynomial. Let $\mathbb{C}[\lambda]$ be our polynomial ring. Then the characteristic polynomial of $A$ equals the product of the polynomials of $A'$ and $A''$:

$$ \det (A - \lambda I_{\dim A}) = \det (A' - \lambda I_{\dim A'}) \det (A'' - \lambda I_{\dim A''}) $$

This ended up being a crucial insight, as we’ll see.

Another fact I needed was the following. As a matter of notation, let $\bar A$ denote the complex conjugate of each entry of $A$.

Fact: Let $A$ be a complex matrix. If $a+bi$ is an eigenvalue of $A$, then its conjugate $a-bi$ is an eigenvalue of $\bar A$.

This is seen by, again, looking at the characteristic polynomial and using the properties of complex conjugation:

$$ \begin{align*} \det (\bar A - \lambda I_{\dim A}) &= \det (\bar A - \overline{\bar\lambda I_{\dim A}}) \\ &= \det (\overline{A - \bar\lambda I_{\dim A}}) \\ &= \overline{\det (A - \bar\lambda I_{\dim A})}. \end{align*} $$

These facts were enough for me to refine the conjecture:

Conjecture (redux): Let $U\in\mathbb{C}^{n\times n}$, and let $V\in\mathbb{R}^{2n\times 2n}$ be the real matrix corresponding to $U$ according to the aforementioned transformation. Then $V$ is similar to $U\oplus \bar U$ when $V$ is trivially interpreted as a real matrix in $\mathbb{C}^{2n\times 2n}$.

This new conjecture is equivalent to the old one by way of those two facts.

Now, things started to look good. If I could find a similarity transform in $\mathbb{C}^{2n\times 2n}$ that block-diagonalizes $V$, and show that such a diagonalization is exactly $U\oplus \bar U$, then I’d be golden.

Since we are “allowed” to work over $\mathbb{C}$, the first step was to actually undo the complex-to-real transformation. However, since we are building a similarity transform, we need it to be invertible.

Again, after trial and error, I found that

$$ \left[ \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -i \\ 1 & i \\ \end{pmatrix} \right] \begin{pmatrix} a & b \\ -b & a \end{pmatrix} \left[ \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -i \\ 1 & i \\ \end{pmatrix} \right]^{-1} = \begin{pmatrix} a+bi & 0 \\ 0 & a-bi \end{pmatrix}. $$

Only until I constructed this matrix, let’s call it

$$ K := \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -i \\ 1 & i \\ \end{pmatrix}, $$

did I have a big “aha” moment. I was hitherto so focused on the (wrong) idea that our complex-to-real transformation was unique or canonical. I “knew” that we could choose either position of $b$ or $-b$ to represent either a number or its conjugate, but I didn’t think deeply enough about the repercussions of that fact. With $K$, it was apparent that our real matrix actually, in some sense, holds both a complex number and its conjugate.

At this point, it was obvious what to do. This was the critical insight.

Our matrix $K$ just works for $2\times 2$ matrices in $\mathbb{R}^{2\times 2}\subset \mathbb{C}^{2\times 2}$. We can extend it by using this little rule of linear algebra. If

$$ X := \begin{pmatrix} X_{0,0} & X_{0,1} & \cdots & X_{0,c-1} \\ X_{1,0} & X_{1,1} & & \\ \vdots & & \ddots & \vdots \\ X_{r-1,0} & & \cdots & X_{r-1,c-1} \end{pmatrix} $$

is a block matrix and $D := \Delta\oplus\cdots\oplus \Delta$ is a block diagonal matrix with $X_{\bullet}$ and $\Delta$ square and having the same shape, then

$$ (DX)_{r,c} = \Delta X_{r,c} \qquad\text{and}\qquad (XD)_{r,c} = X_{r,c}\Delta, $$

i.e., multiplication of these matrices results in $\Delta$ getting “applied” to each block. As such,

$$ K^{\oplus n} V (K^{\oplus n})^{-1} $$

will be a block matrix equivalent to substituting each disjoint $2\times 2$ sub-matrix of $U$ with the matrix like $\operatorname{diag}(z,\bar z)$ where $z$ is calculated as described.

We’re still not where we’re at. This transformed matrix will be a checkerboard pattern of $U$-likes on even-even- and odd-odd-indexed entries, and zeros on even-odd- and odd-even-indexed positions. The last necessary bit then to finish our similarity transform is to permute this matrix in such a way that all positive-signed conjugates are in the top-left $n\times n$ sub-matrix, and all negative-signed conjugates are in the bottom-right $n\times n$ sub-matrix. If we take for granted that permutations are invertible, then we’re done proving it. If we want to construct something, then we observe that all positive-signed conjugates have even indexes, and all negative-signed conjugates have odd indexes, then we define the invertible map

$$ (\Pi X)_{r,c} := \begin{cases} X_{2r,2c} & \text{if }0\leq r,c < n\\ X_{2r+1, 2c+1} & \text{if }n\le r,c < 2n\\ X_{2r, 2c+1} & \text{if }0\leq r < n\land n \leq c < 2n\\ X_{2r+1, 2c} & \text{if }n\leq r < 2n\land 0 < c \leq n \end{cases} $$

We can recover the matrix for $\Pi$ by applying it to the identity matrix.

And with that, we have a similary transform:

$$ (\Pi K) V (\Pi K)^{-1} = U\oplus \bar U. $$

Since eigenvalues are preserved under similarity, we’ve proved the conjecture.

Revisiting the computation

We have proved the conjecture, but does that actually get us any further in our quest to compute the eigensystem of a complex matrix using an algorithm for real matrices?

It does; we now know that the eigenvalues of $V$ are completely contained in the set for $U$. One thing we haven’t addressed, however, are the eigenvectors.

If we return to thinking about direct sums, then the eigenvectors of a matrix $A := A'\oplus A''$ are going to be eigenvectors of $A'$ and $A''$ “lifted” to the larger sum of spaces. In other words, if $x$ is an eigenvector of $A'$, then $x\oplus \vec 0_{\dim A''}$ is an eigenvector of $A$, where $\vec 0$ denotes a vector of zeros (i.e., $x$ is padded with zeros).

As such, in our block-diagonal basis, the eigenvectors of $V$ are related to the eigenvectors of $U$ in the following way. Suppose $(\lambda, x)$ is an eigenvalue-eigenvector pair of $U$. Then $Ux = \lambda x$. This directly implies that $\bar U \bar x = \bar\lambda\bar x$. Since $V\sim U\oplus\bar U$, $x\oplus\vec 0_{n}$ and $\vec 0_n\oplus\bar x$ are eigenvectors of $V$.

All that’s left to determine is: Which eigenvector is the right one without doing a costly similarity transform?

To do this, we “disembed” the eigenvector from the vector space of $V\sim U\oplus\bar U$ into the vector space of $U$ in such a way that the $\bar U$ subspace collapses to zero. We can do this easily. Our eigenvectors of $V$ without transformation are going to look like

$$ \begin{pmatrix} a+bi \\ -b+ai \\ c+di \\ -d+ci \\ \vdots \end{pmatrix} \qquad \text{and} \qquad \begin{pmatrix} a-bi \\ -b-ai \\ c-di \\ -d-ci \\ \vdots \end{pmatrix}, $$

where these correspond to eigenvalues $\lambda$ and $\bar\lambda$ respectively. One can see this by way of two facts:

Also, notice the resemblance between pairs of entries in our first (“true”) eigenvector, and our complex number representation:

$$ \begin{pmatrix} a & b\\ -b & a \end{pmatrix} \qquad \text{and} \qquad \begin{pmatrix} a+bi \\ -b+ai \end{pmatrix}. $$

Taking either vector, we wish to annihilate the “wrong” one and send the “right” one to the space of $U$. Call either eigenvector $x\in\mathbb{C}^{2n}$ and the resulting vector $y\in\mathbb{C}^n$. Consider the map

$$ y_k = \frac{x_{2k} - ix_{2k+1}}{2} $$

for integers $0\le k < n$. (It is not actually necessary to divide by $2$, since if $y$ is an eigenvector, then so is $2y$.) With this map, the eigenvector $y$ of the conjugate ($\bar U$) space will vanish, or it will map to, for example, $a+bi$ in the ordinary ($U$) space, as desired. In the latter case, $y$ will be an eigenvector of $V$.

If $y = 0$, then it is discarded along with its corresponding eigenvalue, otherwise, the eigenvector and eigenvalue are kept, and we are done.

The final pseudocode

In summary, our algorithm to compute the eigensystem of a complex matrix is as follows.

INPUT:
    n : an integer, the dimension of the problem
    U : an n x n matrix of complex numbers

OUTPUT:
    Lambda : a list of complex numbers, eigenvalues of U
    Y      : a list of complex n-vectors, eigenvectors of U

Step 1:
    V : a 2n x 2n matrix of real numbers
    Let V = a block matrix constructed by
            expanding each element a+bi of U
            into a matrix [a, b; -b, a]
    
Step 2:
    Mu : a list of complex numbers
    X  : a list of complex vectors of dimension n
    Let Mu, X = eigenvalues and eigenvectors of V
                using a program to compute eigenvalues
                of real numbers

Step 3:
    Initialize Lambda = [] and Y = []
    For mu, x in Mu, X:
        y : a complex n-vector
        For k from 0 to n-1:
            Let y[k] = x[2*k] - i*x[2*k+1]
        If y is a non-zero vector:
            Push mu onto Mu
            Push y onto Y

Much ado about nothing?

While this was an interesting puzzle, and leads to working code, was it all worth it? Honestly, I’m not sure it’s the best engineering decision. What happens when we need singular-value decomposition, or some other goofy matrix algorithm? It’s difficult to imagine the complex embedding trick will work well.

On the other hand, it saves us from using more antiquated FORTRAN 77 code than we need to. :)


Thanks to Juan Bello-Rivas, Erik Davis, Bryan Fong, Brendan Pawlowski, and Eric Peterson for insightful discussions.