Hacker News new | ask | show | jobs
by msellout 5024 days ago
Would Disco (http://discoproject.org/) work for you?
2 comments

I don't think MR is a good abstraction for implementing linear algebra, and I expect the overhead to be too high (although I don't have numbers to back that up). For large problems (>> couple of machines worth of RAM), you use big iron HPC solutions, or you avoid 'exact' linear algebra altogether to focus on one-pass algorithms.

For example, instead of computing an exact SVD, you will use something like Hebbian algorithm to compute the SVD in a streaming manner (that's what Mahaout implements for example).

No, the sparse matrix code in SciPy is plain C (not even multi-core, let alone distributed).

EDIT: or did you mean Disco offers distributed sparse CSC operations?