| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by qwhelan 2335 days ago

> Occurs in pandas 0.25.1 (and the release notes for 0.25.2 and 0.25.3 don't mention such a change), so that would likely be still the case in the latest stable release.

It was released in 0.24.0: https://pandas.pydata.org/pandas-docs/stable/user_guide/inte...

For example:

    pd.DataFrame({"foo": [1,2,3,4,None]}, dtype=pd.Int64Dtype())

        foo
    0     1
    1     2
    2     3
    3     4
    4  <NA>

    pd.DataFrame({"foo": [1,2,3,4,None,9223372036854775807,9223372036854775806]}, dtype=pd.Int64Dtype())

                       foo
    0                    1
    1                    2
    2                    3
    3                    4
    4                 <NA>
    5  9223372036854775807
    6  9223372036854775806

1 comments

jfim 2335 days ago

Sure, if you specify the type. It's still a gotcha because the default behavior is to upcast to floating point unless the type is defined for every integer column of every data frame, which isn't very pythonic.

The example with the (incorrect) join above shows how even other operations can cause this type conversion.

link

qwhelan 2335 days ago

Yes, there's a lot of existing code written assuming the old behavior. But most code has only a few ingestion points, so it's pretty simple to turn on.

link