Hacker News new | ask | show | jobs
by fifilura 1622 days ago
Pandas just extends that 1M row limit to ~10M row limit (whatever fits in your RAM)

And whenever I work with pandas I run into pesky bugs with not handling integers properly.

My answer to that, for both pandas and excel work is to do as much of the work in SQL/views as possible.

2 comments

I've been trying to use pd.NA to embrace the future, but it runs into a nightmare of other bugs - float vs Float64 mixing bugs and stuff like seaborn that doesn't handle Float64 anyway etc.

As an example of the sharp stuff that causes bugs.

could you elaborate more on the pandas bugs?
I think it goes something like this like this (from the top of my head):

For historical reasons pandas did not handle NULL in integer columns. Which happens regularly in SQL or when you import csv:s. And that was always a pain, but a known pain. If you have NULL you will have to treat it as a float.

Up until recently when they introduced a new type called Int64 (distinguishable from the original int64 by the capital "I"). Unfortunately that uncovered new bugs, in particular for plotnine (ggplot for python) which is my favorite plotting library. Seems like they didn't want to handle Int64 types. Or at least not yet.

And your sibling post also gives a hint about the confusion. She also mentions seaborn which is another alternative plotting library which apparently also doesnt work.