Hacker News new | ask | show | jobs
by crdrost 2066 days ago
That is genuinely a really fun way to look at it, thank you for linking this! Especially this fits nicely with Markov matrices where you have N input nodes and N output nodes and the sum of all of the probabilities coming out of one of the nodes needs to equal 1.

What I might find a little more difficult to teach to people through this lens is the phenomenon of eigenvectors, but I suppose that's to be expected—nothing will work well for all purposes.

2 comments

>What I might find a little more difficult to teach to people through this lens is the phenomenon of eigenvectors

Why? Eigenvectors are simply inputs to the network where the output keeps its shape, that is, at most it gets rescaled, as if you had applied a uniform gain to the components, but otherwise it will be the same as the input.

So for example let me ask for your knee-jerk opinions based on this idea:

- does a matrix have the same left-eigenvectors as its right-eigenvectors?

- what is the relationship between the left-eigenvalues and right-eigenvalues?

- is there always an eigenvector? when is there a complete set? how do you generalize your notion of eigenvectors so that matrices always have a complete set of them?

I am not sure any of these are intuitive here.

If A is a square matrix, then a left eigenvector v is a vector such that vA = \lambda_v v for some \lambda_v. Likewise, if u is a right eigenvector of A, Au = \lambda_u u.

Notice that u and v cannot be equal, because they are not the same shape. However, if v is a right eigenvector with eigenvalue \lambda, then v^T is a left eigenvector with eigenvalue \lambda, as well. More or less what this means is that we tend to just ignore left eigenvalues and only use the right eigenvalues, because the math is exactly the same up to a transpose operation.

A matrix does not necessarily have nontrivial eigenvectors. Think about the 0 matrix here. But, if A is nonzero, then it must have at least one nonzero eigenvalue, hence one nontrivial eigenvector. This is because a nonzero matrix must have at least one nonzero row and column; however, if you construct the other rows (columns) to be multiples of the first column, you end up with a matrix with only one nontrivial eigenvalue. This also illustrates the conditions necessary for an nxn matrix A to have n linearly independent eigenvectors: the rows of A must be linearly independent.

All of this is covered in a decent undergrad linear algebra course. I would suggest either finding a video course, or getting a good book and working through it, if you want to understand these things better.

> However, if v is a right eigenvector with eigenvalue \lambda, then v^T is a left eigenvector with eigenvalue \lambda, as well.

Consider the matrix

    M = [ 0.50  0.50 ]
        [ 0.25  0.75 ]
Clearly [1; 1] is a right eigenvector with eigenvalue 1. But [1 1] M = [0.75 1.25].

> A matrix does not necessarily have nontrivial eigenvectors. Think about the 0 matrix here.

The zero matrix has _all_ the nontrivial vectors as eigenvectors.

> But, if A is nonzero, then it must have at least one nonzero eigenvalue, hence one nontrivial eigenvector.

Consider

    N = [ 0  1 ]
        [ 0  0 ]
This has no non-zero eigenvalues, but it is not the zero matrix. It does have a nontrivial eigenvector, [1; 0], because every square complex matrix has an eigenvector. (This is in contradiction to another statement you said, that a matrix does not have nontrivial eigenvectors—that is technically true but only in a limited sense that your space might not be ℂ^n or something that can be quickly generalized to it like ℝ^n. But like we’re in a computing forum and I might find that pedantic.)

I don't know why you consider the eigenvalue zero to somehow not count as an eigenvalue. Very suspicious.

> This also illustrates the conditions necessary for an nxn matrix A to have n linearly independent eigenvectors: the rows of A must be linearly independent.

Consider

    P = [ 1  1 ]
        [ 0  1 ]
The rows are linearly independent, but this does not have a second eigenvector.

> All of this is covered in a decent undergrad linear algebra course. I would suggest either finding a video course, or getting a good book and working through it, if you want to understand these things better.

sigh ... See this is why I sometimes feel bad about commenting here. Like, this comment thread was supposed to be a celebration of this different way of thinking about linear algebra, and then I have to deal with this stuff. Like I totally don’t think you meant to come across as condescending, but given that I have taught crash courses for struggling friends on linear algebra concepts they missed in their undergrad to get through our graduate work in physics, you know, it kind of does come across that way.

Yeah. In my experience, specialized subreddits (like /r/math or /r/haskell) have less such silliness than HN.

For the first matrix, maybe a simpler example is M = [1 1; 0 0]. Then [1; 0] is a right eigenvector with eigenvalue 1, but [1 0] M = [1 1].

> I don't know why you consider the eigenvalue zero to somehow not count as an eigenvalue. Very suspicious.

Zero can of course be a proper eigenvalue. Probably they confused it with the fact that the zero vector typically doesn't count as an eigenvector.

>But, if A is nonzero, then it must have at least one nonzero eigenvalue, hence one nontrivial eigenvector.

How about this matrix?

  [[0, -1],
  [[1, 0]]
>However, if v is a right eigenvector with eigenvalue \lambda, then v^T is a left eigenvector with eigenvalue \lambda, as well.

A snippet of code producing a counterexample:

  import numpy as np
  import scipy.linalg as spla

  A = np.random.randn(3, 3)
  right_eigenvector = spla.eig(A)[1][:,0]
  right_eigenvalue = ((A @ right_eigenvector) / right_eigenvector)[0]

  potential_left_eigenvector = right_eigenvector[np.newaxis, :]

  # if all components of the following are not the same, then it's not
  # a left eigenvector
  print((potential_left_eigenvector @ A) / potential_left_eigenvector)
  # prints [[-1.1327836 -0.j -0.14850693-0.j -1.84397691+0.j]]
Your first matrix doesn't count, but the theorem is bogus: see my counterexample above.

The correct theorem is that every complex square matrix has an eigenvector. If we interpret your matrix as a complex matrix, then it does have two eigenvectors, namely [1; ±i]. Hence why I’d say it kind of “doesn’t count.”

The way to see it is to realise that the eigenvalues are the roots of the characteristic polynomial (which is not hard to prove). But by the fundamental theorem of algebra, every complex polynomial has a root (it does actually have n roots, where nxn is the dimension of the matrix, but those roots can all be the same). This also shows why the theorem is, in general, false if you restrict yourself to real numbers.

(I also think that the counterexample exhibited "counts" in the way that it's intuitively very clear why it can't have a real eigenvector. The R^2 plane can be easily visualised, as opposed to C^2, and the matrix exhibited acts as a rotation of R^2, so of course there can't be an eigenvector. As you said, once you allow complex numbers, that's not true anymore, but at that point the visual intuition breaks down somewhat.)

Oh. Oh my. I have studied linear algebra theory some. I have used it a ton. I have used eigenvectors to solve problems. I have never grokked them.

This description changed that. Thank you!

An eigenvector would correspond to the "stationary distribution" of the Markov transition matrix represented by the graph. (think "page rank")