| HN Mirror

>You didn't specify that reason, so from the rest of your comment I took it to mean that Python (with numpy) was fast and good enough to write deep learning stuff. That doesn't seem to be the case for TensorFlow.

Tensorflow tensors are numpy arrays, or are transparently viewable as such.

>If you have an array of records (dtype objects), and one of the fields is a string, am I correct that each element needs to allocate memory to hold the longest possible value that can occur for that field? What if that is not known beforehand?

Yes, although you can also store numpy arrays of pyobjects, which are arrays of pointers. You'll be able to vectorize the code, but you won't get the same performance improvements as with a normal numpy array, because that same level of performance isn't possible with an array of pointers.

Note that for most machine learning applications, you'd preprocess your string into a vector of some kind.

>How do you deal with optional fields (e.g. int or null)? Do you need to add a separate boolean to indicate null?

Yes, but I'm not sure when you'd do that. That is, again in most machine learning applications you'd be representing things as one-hot arrays or as some kind of compressed high dimensional position vector, where 0 would represent a lack of presence of some thing.

>How do you deal with union types

dt = np.dtype((np.int32,{'real':(np.int16, 0),'imag':(np.int16, 2)})

is a 32 bit int that can also be accessed as a 16 bit complex number via .real and .imag.