Hacker News new | ask | show | jobs
by omaranto 3379 days ago
That bit about the stationary distribution not changing if you add a diagonal matrix sounds completely wrong to me. Let me see if I understand what you mean. Given a matrix M with non-negative entries (and no row of just zeros), let S(M) denote the stochastic matrix you get by normalizing each row of M. You are saying that if M is any matrix and D is a diagonal matrix with non-negative entries then S(M) and S(M+D) have the same stationary distribution?
2 comments

Moreover data is collected over the entire history. A matrix is a linear operator from time step T_i to T_i+1. By conflating all historical observations into one matrix it definitely is not an ordinary transition matrix.

That apart from the fact that it is questionable that it can be represented by an operator that is finite and linear.

It's more likely a stochastic process (infinite matrix) with births and deaths.

I would be surprised if it became true. :-)

That was my immediate thought upon reading. A little more accuracy in the description would be helpful. This should be presented as: "If this aggregated data reflected a constant across time, then we can see where language usage would end up in the 'long run distribution'." But the jump matrix shown here will change with time. Most likely, if search results could be binned up by time period, the time dependence of the distribution represented by the eigenvector(s) would be somewhat interesting to watch, even if not remotely predictive. Could animate that or contour plot it.... exercise for the author... :)
Right. Both the matrix S and the identity matrix will project the stationary distribution onto itself. So any linear combination of them will project the stationary distribution onto itself. Let me know if I'm saying something really stupid
Oh, you meant "multiple of the identity matrix" when you said "diagonal matrix"?
Doesn't that mean you're assuming the number of people staying with any language X is the same regardless of X?
[[1 1] [1 1]] and [[10 1] [1 1]] will have different stationary distributions, the values on the diagonal will likely be different than a multiple of the identity matrix.