Hacker News new | ask | show | jobs
by hgibbs 1621 days ago
I honestly can't think of a use case for excel that pandas doesn't handle better. The 1 million row limit is crazy limiting, it's just nicer to not have to worry about that kind of stuff.
3 comments

If the job is to model financial statements to produce a forecast model, to answer questions like

* what cash flows can we expect from this business

* what will the profit be in 12 months

* what will that do to my balance sheet

... then Excel is still the best tool. Financial statements are usually less than 100 rows each. All the modelling is on a tab, out front where you can print it and sense check it with a highlighter. People on hn often think towards the big data world ( and like the other poster I would exhaust sql before anything else for ad hoc analysis), but there is a huge amount of work done that is more like my use case than yours.

Pandas just extends that 1M row limit to ~10M row limit (whatever fits in your RAM)

And whenever I work with pandas I run into pesky bugs with not handling integers properly.

My answer to that, for both pandas and excel work is to do as much of the work in SQL/views as possible.

I've been trying to use pd.NA to embrace the future, but it runs into a nightmare of other bugs - float vs Float64 mixing bugs and stuff like seaborn that doesn't handle Float64 anyway etc.

As an example of the sharp stuff that causes bugs.

could you elaborate more on the pandas bugs?
I think it goes something like this like this (from the top of my head):

For historical reasons pandas did not handle NULL in integer columns. Which happens regularly in SQL or when you import csv:s. And that was always a pain, but a known pain. If you have NULL you will have to treat it as a float.

Up until recently when they introduced a new type called Int64 (distinguishable from the original int64 by the capital "I"). Unfortunately that uncovered new bugs, in particular for plotnine (ggplot for python) which is my favorite plotting library. Seems like they didn't want to handle Int64 types. Or at least not yet.

And your sibling post also gives a hint about the confusion. She also mentions seaborn which is another alternative plotting library which apparently also doesnt work.

There isn’t a 1M row limit. There is an entire backend data model that can hook into any data source for the most part and can be written in a functional language.