Hacker News new | ask | show | jobs
by Notre1 3154 days ago
For data frame implementations, you can also look at Spark's Dataset/DataFrame and Graphlab Create's SFrame.

The company/team behind Graphlab Create was bought by Apple and the open sourced components haven't been updated since then. Because of that, I wouldn't use it in production, but if you are just looking for functioning implementations to compare, that gives you one more.

2 comments

Thanks, I never heard of Graphlab Create. It is a substantial piece of code! It says it's "out of core", which means it's probably more similar to Parquet/ORC than Arrow. But still interesting.

https://github.com/turi-code/SFrame/tree/master/oss_src/sfra...

For comparison, dplyr and arrow:

https://github.com/tidyverse/dplyr/tree/master/src

https://github.com/apache/arrow/tree/master/cpp/src/arrow

C++ does seem to be useful for stuff like this.

I thought GraphLab Create had some nice functionality, it is too bad that the Apple acquisition appears to have effectively killed the publicly available product. It doesn't look like you can even pay for the commercial part anymore.