|
|
|
|
|
by halflings
2889 days ago
|
|
Unless you present clear arguments, I'd refrain from saying that Pandas is "poorly re-inventing SQL". Pandas is now the standard for data analysis (as long as things fit into memory). It's much much easier to debug than a SQL command. You can write operations as a succession of small logical steps (instead of one huge query that is hard to debug). It's raw Python, so you can do something like: df.groupby('movie_id').agg(dict(ratings='median', price=lambda p : np.percentile(p, .95))).plot.bar(bins=50) |
|
I just use SQL to grab and if necessary aggregate the data and then do everything else in Pandas - using Python custom reducers to deployed trained models although we are migrating to GCP now so soon that won't be necessary.