Hacker News new | ask | show | jobs
by qwhelan 2335 days ago
> Occurs in pandas 0.25.1 (and the release notes for 0.25.2 and 0.25.3 don't mention such a change), so that would likely be still the case in the latest stable release.

It was released in 0.24.0: https://pandas.pydata.org/pandas-docs/stable/user_guide/inte...

For example:

    pd.DataFrame({"foo": [1,2,3,4,None]}, dtype=pd.Int64Dtype())

        foo
    0     1
    1     2
    2     3
    3     4
    4  <NA>

    pd.DataFrame({"foo": [1,2,3,4,None,9223372036854775807,9223372036854775806]}, dtype=pd.Int64Dtype())

                       foo
    0                    1
    1                    2
    2                    3
    3                    4
    4                 <NA>
    5  9223372036854775807
    6  9223372036854775806
1 comments

Sure, if you specify the type. It's still a gotcha because the default behavior is to upcast to floating point unless the type is defined for every integer column of every data frame, which isn't very pythonic.

The example with the (incorrect) join above shows how even other operations can cause this type conversion.

Yes, there's a lot of existing code written assuming the old behavior. But most code has only a few ingestion points, so it's pretty simple to turn on.