| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by otsaloma 1010 days ago
	Still no NA?

2 comments

ngoldbaum 1009 days ago

I’m working on adding missing data support for strings as part of adding a UTF-8 variable-width string type to NumPy. Not a general solution but should help with a lot of use-cases. https://numpy.org/neps/nep-0055-string_dtype.html

link

otsaloma 1009 days ago

The current memory use of string arrays is another major issue, glad to see this being worked on!

link

EForEndeavour 1009 days ago

np.nan? Not trying to be funny, but hoping to learn whether I'm missing something about limitations of np.nan which would be solved by some other kind of missing value indicator.

link

otsaloma 1009 days ago

np.nan is only for floats, doesn't help with integer, boolean, string etc. Also, datetimes have NaT, but it's troublesome to e.g. do different checks np.isnan() or np.isnat() depending in the data type. And we don't even have np.nat, but need np.datetime64("NaT"), so it's just confusing.

link

sheepshear 1009 days ago

Why not use a structured array with an 'isna' field to use as a mask when performing operations?

link

otsaloma 1009 days ago

How is that convenient? Missing data support belongs deep in NumPy itself (or any other similar package) so that operations can do the right thing and missing values propagate correctly. For example, let's say you want by definition missing values to sort last. If you roll out your custom missing value marker, you'll also need to roll out your own custom sort function. And the same for a whole lot more stuff.

link

sheepshear 1009 days ago

What about a MaskedArray? ndarrays are homogeneous by definition.

link