Hacker News new | ask | show | jobs
by pvxc 5753 days ago
Such large differences imply a difference in the algorithm

Indeed, you write

c = mat2.getcol(j)

norms[0, j] = scipy.linalg.norm(c.A)

which means (i) extract a sparse column vector, (ii) convert it to a dense vector, and (iii) compute the norm. Now, this should explain the speed difference. Looking at the nnz, a dense norm can take up to a factor 5e5/(1.2e8/1.3e7) ~ 54000 longer :)

The main issue here is that the linear algebra stuff under `scipy.linalg` doesn't know about sparse matrices, and tends to convert everything to dense first. You'd need to muck around with `m2.data` to go faster.

1 comments

Thanks a bunch for the help. It would be nice if this were documented somewhere besides reading the source code though :( And none of my googling turned up m2.data.

I'd actually guessed that it might be making columns full, but I'd expected to see a step-ladder up and down memory pattern as fectors were allocated, gc was triggered, vectors were allocated, etc. I didn't observe such a pattern; memory usage was almost constant.

Anyway, thanks again for your help -- I'd offer via email to buy you a beer if you're ever in SF, but no email, so...