|
|
|
|
|
by throwawaymath
2944 days ago
|
|
Speaking as someone who uses Python with half a terabyte of memory, I think you're underestimating how much memory these labs will use. In my experience most HPC architecture is optimized first by rewriting the code in the same (already fast) language or library, then by increasing hardware resources (especially among distributed nodes), then by seeking a new library in the same ecosystem, and finally by moving to a new language if they have to. Moving to a new language has more friction than basically anything else unless there's a real language feature missing or the budget doesn't allow for more compute hardware. Hundreds of gigabytes is well below where academic and industry labs will start having to think about these problems. It's going to be really tough to displace Python with anything equally as general purpose. This is all to say that I buy that Julia can shine more than Python for I/O bound HPC, but it really shouldn't be I/O bound until you have terabytes of data (and likely tens of terabytes). And aside from that, the Python numerical computing ecosystem includes a lot more than just Numpy and Pandas. As other commenters have mentioned, you can use Dask if your hot data has grown into the terabyte range. Anaconda includes a lot of libraries which can bail you out of situations once you've left the familiar world of Pandas data frames. |
|
https://en.m.wikipedia.org/wiki/Gustafson%27s_law
For instance.