Hacker News new | ask | show | jobs
by hotpotjunkie 2890 days ago
Good, I hope this trend of shoving ML into SQL (instead of the other way around) continues. I always thought it was silly that every "data wrangling" system like Pandas and R needed to (poorly) re-invent SQL.
1 comments

Unless you present clear arguments, I'd refrain from saying that Pandas is "poorly re-inventing SQL".

Pandas is now the standard for data analysis (as long as things fit into memory). It's much much easier to debug than a SQL command. You can write operations as a succession of small logical steps (instead of one huge query that is hard to debug).

It's raw Python, so you can do something like:

df.groupby('movie_id').agg(dict(ratings='median', price=lambda p : np.percentile(p, .95))).plot.bar(bins=50)

Yeah, also in Pandas you can do stuff that otherwise requires writing a custom reducer or UDAF in which case you aren't using SQL anyway.

I just use SQL to grab and if necessary aggregate the data and then do everything else in Pandas - using Python custom reducers to deployed trained models although we are migrating to GCP now so soon that won't be necessary.