Hacker News new | ask | show | jobs
by otsaloma 1010 days ago
Still no NA?
2 comments

I’m working on adding missing data support for strings as part of adding a UTF-8 variable-width string type to NumPy. Not a general solution but should help with a lot of use-cases. https://numpy.org/neps/nep-0055-string_dtype.html
The current memory use of string arrays is another major issue, glad to see this being worked on!
np.nan? Not trying to be funny, but hoping to learn whether I'm missing something about limitations of np.nan which would be solved by some other kind of missing value indicator.
np.nan is only for floats, doesn't help with integer, boolean, string etc. Also, datetimes have NaT, but it's troublesome to e.g. do different checks np.isnan() or np.isnat() depending in the data type. And we don't even have np.nat, but need np.datetime64("NaT"), so it's just confusing.
Why not use a structured array with an 'isna' field to use as a mask when performing operations?
How is that convenient? Missing data support belongs deep in NumPy itself (or any other similar package) so that operations can do the right thing and missing values propagate correctly. For example, let's say you want by definition missing values to sort last. If you roll out your custom missing value marker, you'll also need to roll out your own custom sort function. And the same for a whole lot more stuff.
What about a MaskedArray? ndarrays are homogeneous by definition.