Hacker News new | ask | show | jobs
by fnord123 3358 days ago
McKinney has been hard at work getting parquet and arrow support in pandas.

http://wesmckinney.com/blog/outlook-for-2017/

>Give the kind of people behind Arrow, I would love wrapper that will use Arrow to do all of this...But doesn't matter at the end of the day.

pyarrow; pyarrow.parquet (which uses parquet-cpp).

2 comments

Wow this is great. I've been working around the jvm to integrate sklearn and some spark jobs that produce Parquet. This is a huge relief
Arrow doesn't do scikit - atleast last time I checked . Has it changed ?
pyarrow has methods to convert to pandas, which scikit supports

http://pyarrow.readthedocs.io/en/latest/pandas.html

No - this is not it. Scikit models need to be persisted. The only ways I have found is pickle or dill.

Take a look at this to understand what I mean . http://stackoverflow.com/questions/32757656/what-are-the-pit...