Hacker News new | ask | show | jobs
by stan_rogers 2305 days ago
Complex numbers, the way they're used in most cases, is a tuple notation. They're a handy way of keeping your chocolate separate from your peanut butter, so to speak, as that little "times i" makes it difficult to accidentally get things mixed up. And that's the way I always explained it to my students: there are imaginary numbers in the original sense of fake roots that will go away if you ignore them long enough, and there are imaginary numbers in the sense that it makes some kinds of calculations easier to keep straight. I've never been convinced that they are the same thing. One is an annoying but temporary consequence of arithmetic, while the other is just a convention, when all's said and done.
2 comments

They're the same thing in the sense they have the same roots.

The most confusing thing about complex numbers is the language. First you're told negative numbers can't have roots, then you're told they so can too, but you have to call the roots "complex" or "imaginary."

This sets up cognitive dissonance which can be harder to deal with than the math. (What even is an "imaginary number"? What are those words supposed to mean?)

In reality complex numbers are a way of moving from the number line to a number circle. (Which eventually generalises to a 3-sphere when you get to quaternions.)

That's all they are. Instead of linear arithmetic - which is about combining magnitudes in one dimension - you can now do arithmetic that combines magnitudes with rotations.

The extra dimension makes it possible to solve equations with solutions that don't exist on the basic number line. It also makes it easier to do calculations that combine magnitude with phase - which includes pretty much anything that rotates or processes linear combinations of sine waves, and which a straight vector tuple can't handle.

If someone had told me this when I was learning complex numbers the cognitive dissonance wouldn't have hurt quite as much.

I found so much of math I had learned previously in the “it’s weird but this works if you take it on faith” sense was suddenly blatantly obvious after learning some abstract algebra. I wish I had learned that stuff way earlier.

In the case of complex numbers, I find the “paradox” disappears when you think of it in terms of fields abstractly. To put it maybe a bit overly simply - instead of focusing on the idea of “square roots of negative numbers”, instead step back and consider that number-like operations make sense for things that aren’t numbers at all in the traditional sense. One particularly useful example is 2d vectors, which you can add in the usual sense and “multiply” in polar form by multiplying “r” and adding “theta”. It turns out that these vectors with these operations act a LOT like numbers, and it also turns out that that weird multiply operation is actually super useful. One easy interpretation is combined scale and rotation transforms, with “multiplication” implementing composition.

Once you do that, it also turns out that solving equations like “what transformation composed with itself equals a 2x scaling with 180° rotation?” also make sense (i.e. “solve x^2 = -2”), and when you solve polynomials in this new system you get more solutions than you did for regular numbers. And that the thing you just invented IS the field of “complex numbers”.

[Sorry for the verbosity and probably poor organization, I’m in a bit of a hurry IRL and didn’t have time to edit it down. I did edit a bit for clarity and to fix typos, etc., though]

You're describing some sort of type mismatch between two concepts I think, and I really don't understand it: and I feel like as someone who occasionally teaches these things I really would like to. Could you elaborate?

For my part, I do like to think of adjoining numbers onto an existing system, but that immediately becomes matrices.

So you decide to adjoin an ε such that ε² = 0. Your numbers are now vectors (a, b) and the action of ε is to map this to (b, 0) so that it is represented by the matrix

    [ 0, 0 ]
    [ 1, 0 ]
And thus the number a + b ε is perfectly encoded in the matrix algebra as a I + b ε =

    [ a, 0 ]
    [ b, a ]
This sort of trick is really helpful for programmers because we often have a field of bigints or rationals that we want to adjoin an irrational number to. So for example when you want to do Fibonaccis using exponentiation by squaring, it helps (although this is not obvious at first) to adjoin the golden ratio φ satisfying φ² = 1 + φ to your bigints, so that your numbers look like a + b φ =

    [ a,   b   ]
    [ b, a + b ]
Then F_n = [1, 0] . φ^(n+1) . [1, 0] and you can exponentiate by squaring straightforwardly. So you start from [0,1;1,1] and square that to [1,1;1,2] and square that to [2,3;3,5] and square that to [13,21;21,34] and square that to [610, 987; 987, 1597], so you get to skip ahead past 55, 89, 144, 233, and 377, at the cost that multiplications are slower than additions but also you potentially get to allocate less memory.

When you adjoin to the reals an i such that i² = -1 these matrices have the shape of the 2D scaled rotations, so that is what complex numbers just “are” to me. The representation as a tuple is to me the same as representing the above matrix as (a, b) or (a, b, a+b) to save time or space... The whole thing is a matrix but the entries are indeed redundant and so you don't need to store all of them at once.

So that's why I do not understand what you mean by these being separate concepts. Can you elaborate?

I'd like to understand what you're doing, but I'm missing something.

1. Adjoining ε such that ε² = 0.

If the numbers have the form a + bε, isn't the action of ε to map (a, b) to (0, a)? Did you mean to say that ε = [0,1; 0,0]?

2. Adjoining φ = (1 + √5)/2 = [0,1; 1,1].

I'm fascinated by the idea of determining that a 2x2 matrix is equal to a real number.

I see that if you take successive powers of the real number φ, and express them in the form aφ + b, the coefficients a and b will take on values from the Fibonacci sequence. So far so good.

I don't follow the claim that the matrix [0,1; 1,1] actually represents φ. This would imply that the formula F_n = [1, 0] · φ^(n+1) · [1, 0] means that F_N equals the real number 1 (= 1 + 0φ), times φ^{n+1}, times 1 again. But this isn't true. It certainly is true that [1,0][0,1; 1,1]^{n+1}[1,0] is equal to F_n, but I don't follow the interpretation as adjoined numeric values (as opposed to as coefficients of φ).

3. Equivalence of representations [a,b] and aI + bM, where M is a 2x2 matrix representing any adjoined number.

This looks to me like the claim that if [a,b]M = [c,d], then (aI + bM)M = cI + dM. I tried to work this out algebraically and I'm pretty sure it isn't true in general. I'm open to being told that I'm wrong about this. Have I interpreted the idea correctly? What are the conditions under which the equivalence holds?

(1) This gets a little into difficult notation but e.g. in Mathematica-style notation,

    {{0,0},{1,0}} {{a},{b}} = {{0}, {a}}.
I think that you are preferring to left multiply your matrices by your points so that your points remain horizontal, whereas I am just used to my vectors being column vectors that I write as points sometimes? So that is why we are getting transposes of each other's notation.

(2) So the case for φ is very similar to the case for ε: you want to start with the constitutive relation, in this case φ² = φ + 1, to build the matrix with first column {{0},{1}} (multiplying 1 by φ gives φ) and the second column {{1},{1}} (the above constitutive relation).

The isomorphism is then that if X represents this adjoined unit then the point {{a}, {b}} becomes a I + b X one way [or say if you have a cubic constitutive relation then {{a},{b},{c}} becomes a I + b X + c X² similarly] and M becomes M {{1},{0}} [or say M {{1},{0},{0}} etc].

I would definitely agree that this is probably a much narrower statement than your “if [a,b]M = [c,d], then (aI + bM)M = cI + dM” as the structure of X is very tight. It is always [e_2 e_3 ... e_n c] where e_i is a unit column vector with a 1 in the i’th place and 0s everywhere else, and c is the vector embodying the constitutive relation.

But if you want a condition, the isomorphism condition is probably the best place to go.

So the claim is that φ^n = F_{n-1} + F_n φ.