|
|
|
|
|
by bunderbunder
2943 days ago
|
|
Right now it’s a hard sell vs Python, but I can imagine Python running out of runway soon. A lot of it’s existing scientific and statistical computing stack is built around the assumption that you’ll be working with data that conveniently fits in memory. Once you’ve sized out of pandas/scipy/scikit, your next major option is Spark, which is certainly powerful, but is also unwieldy. I could see something like Julia earning a lot of mindshare if it had a really polished solution for the space between, “my data is hundreds of megabytes”, and, “my data is hundreds of gigabytes”. |
|
Moving to a new language has more friction than basically anything else unless there's a real language feature missing or the budget doesn't allow for more compute hardware. Hundreds of gigabytes is well below where academic and industry labs will start having to think about these problems. It's going to be really tough to displace Python with anything equally as general purpose.
This is all to say that I buy that Julia can shine more than Python for I/O bound HPC, but it really shouldn't be I/O bound until you have terabytes of data (and likely tens of terabytes). And aside from that, the Python numerical computing ecosystem includes a lot more than just Numpy and Pandas. As other commenters have mentioned, you can use Dask if your hot data has grown into the terabyte range. Anaconda includes a lot of libraries which can bail you out of situations once you've left the familiar world of Pandas data frames.