|
|
|
|
|
by kprybol
3398 days ago
|
|
Julia's biggest hurdle is the lack of well functioning DataFrames (or the current fork, DataTables). Tons of issues around nullable arrays, etc. have really slowed progress. I do think it's got a ton of upside, but I've found that reimplementing my R or Python scripts in Julia to be too much of a hassle. Costs of reimplemention greatly outweigh the not insignificant gains in speed. Also check out this article on updates to R 3.4. R tends to be fast enough for most work (I use it regularly on one-off analysis or things that won't ever make it farther than ad-hoc reporting/findings but can't imagine using it in production systems). The listed changes should go a long way towards making R just fast(er) enough for dealing with larger datasets (doesn't help with datasets larger than memory though). For large datasets all the momentum seems to be moving towards Spark (sparklyr is RStudio's SparkR integration. Very much a beta but getting better by the day). On the Python front Dask is awesome for out of memory computation that has no equivalent in R. |
|
Worst case, you can always use MPI with R and run on a Beowulf cluster. Of course that might not help if you want to use a function from a library, and the library itself expects everything to be in memory on one node, but at least it gives you another option for parallelization.