Hacker News new | ask | show | jobs
by textminer 5025 days ago
Python is surprisingly heavy-duty. But my kingdom for a seamlessly distributed or parallelized version of NumPy/SciPy! How nice would it be to just enter "C = A * B", with A living as a sparse CSC across many nodes?
2 comments

Would Disco (http://discoproject.org/) work for you?
I don't think MR is a good abstraction for implementing linear algebra, and I expect the overhead to be too high (although I don't have numbers to back that up). For large problems (>> couple of machines worth of RAM), you use big iron HPC solutions, or you avoid 'exact' linear algebra altogether to focus on one-pass algorithms.

For example, instead of computing an exact SVD, you will use something like Hebbian algorithm to compute the SVD in a streaming manner (that's what Mahaout implements for example).

No, the sparse matrix code in SciPy is plain C (not even multi-core, let alone distributed).

EDIT: or did you mean Disco offers distributed sparse CSC operations?

we, http://continuum.io/, are working on this.