Hacker News new | ask | show | jobs
by datanecdote 2075 days ago
> 2. I don't entirely follow this point. Perhaps using PyArrow's parser would be faster than what is timed here, but is that what the typical Python data science user would do?

I am a Python data science user. If data gets big enough such that loading time is a bottleneck, I use parquet files instead of CSV, and PyArrow to load them into pandas. It’s a one line change. The creator of Pandas is now leading the Arrow project. It’s very seamless. Don’t know if I’m typical but that’s me.

1 comments

Perhaps not directly relevant to your point here, but thought it would be interesting to anyone following along.

Jacob Quinn (karbacca) also has a Julia package for integrating Julia into the Arrow ecosystem: https://github.com/JuliaData/Arrow.jl

Thanks Viral. To be clear, I’m a python user who’s cheering for Julia, because I live the problems of python and do see the potential of Julia as a better path. But unfortunately I’m not prepared to be the early adopter (at least in my day job), and will wait until other, braver users have sanded off the rough edges. God speed and good luck.
That's a completely reasonable viewpoint. Many users of Julia and contributors start out experimenting with it and then end up bringing it into their work when they feel comfortable with it. I hope you will have the same experience one day.