Hacker News new | ask | show | jobs
by halflings 2889 days ago
Unless you present clear arguments, I'd refrain from saying that Pandas is "poorly re-inventing SQL".

Pandas is now the standard for data analysis (as long as things fit into memory). It's much much easier to debug than a SQL command. You can write operations as a succession of small logical steps (instead of one huge query that is hard to debug).

It's raw Python, so you can do something like:

df.groupby('movie_id').agg(dict(ratings='median', price=lambda p : np.percentile(p, .95))).plot.bar(bins=50)

1 comments

Yeah, also in Pandas you can do stuff that otherwise requires writing a custom reducer or UDAF in which case you aren't using SQL anyway.

I just use SQL to grab and if necessary aggregate the data and then do everything else in Pandas - using Python custom reducers to deployed trained models although we are migrating to GCP now so soon that won't be necessary.