|
|
|
|
|
by joshuamorton
3210 days ago
|
|
>The heavy-lifting in e.g. TensorFlow is done in C++. Bindings to Python make sense because it is one of the few sanctioned languages inside Google, and it is widely used outside of Google and easy to pick up. That's exactly the same as with numpy. I'm not sure what your point is. C++ is also one of the few sanctioned languages inside google, as is Java. >Not all data is a good fit for Numpy: some data is non-numeric or not a homogenous array. I'm curious what kind of data you're working with that can't be represented and effectively transformed in a tensor (numpy array). |
|
I was replying to "there's a reason why...". You didn't specify that reason, so from the rest of your comment I took it to mean that Python (with numpy) was fast and good enough to write deep learning stuff. That doesn't seem to be the case for TensorFlow.
> I'm curious what kind of data you're working with that can't be represented and effectively transformed in a tensor (numpy array).
I'm not intimately familiar with the internals of numpy, but my understanding is that the basic data structure is a (multi-dimensional) array of values (not pointers). That leads to a number of questions.
If you have an array of records (dtype objects), and one of the fields is a string, am I correct that each element needs to allocate memory to hold the longest possible value that can occur for that field? What if that is not known beforehand?
How do you deal with optional fields (e.g. int or null)? Do you need to add a separate boolean to indicate null?
How do you deal with union types, e.g. each record can be one of x types, do you make a record that has a field for each of the fields of those x types? Do those fields take up space?