Hacker News new | ask | show | jobs
by lauriat 1991 days ago
Thank you for taking the time to check it out!

Naturally if you're dealing with big arrays/tensors, numpy is the best choice for operating on sequences.

However, ndarrays have downsides for certain use cases - as ndarrays are fixed size, adding elements is very slow, also they don't support functional methods (or rather you have to create a new array every time you apply e.g. a map), and ndarrays of any other type than numbers doesn't really make sense.

Many of the methods are wrappers for built-ins, but I find the syntax of Arrays cleaner than the weirdness of the builtins.

For example, while applying an async "starmap" to an Array is just a method call, with built-in lists you would have go through the whole hassle of importing both ThreadPoolExecutor and starmap, creating an executor, scheduling the function, and finally converting the result back to a list.

2 comments

asyncmap using a thread pool with more than one worker by default is a little silly. Unless you map to a C function, you're just spawning a bunch of threads to contend for the GIL anyway.
ndarrays "create a new array every time you apply"

That resonates with me now that you explain that I can't do it.

I do like chaining things in pandas like `df.select_types("float").head(100).plot.hist()`

Be careful though. Numpy and Pandas go through some trouble to make sure that the data inside the array is not actually copied. For instance, reshaping and slicing just return memory views. Pandas emits a somewhat-infamous warning about it that often confuses newbies.