Hacker News new | ask | show | jobs
by lejar 2554 days ago
Nice overview! One thing I think you should add, which I find immensely useful is the reordering of arrays using indexing.

Take for example:

    In [2]: numpy.array([1, 2, 3])[[0, 2, 1]]                                       
    Out[2]: array([1, 3, 2])
You index using a list and it gives you a view of the array with the new order (the underlying array is not changed and there is no copy being done).
2 comments

Using "fancy" indices like this does result in a copy because it can't be represented as a simple slice of the original matrix. A good explaination is here (it's from 2008 but still true):

https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.ht...

You can verify there's a copy by changing the new array after putting the result in a new variable (see above link for why this makes a difference) and verifying the old one is unchanged:

    >>> import numpy as np
    >>> x = np.array([1, 2, 3])
    >>> y = x[[0, 2, 1]]
    >>> y[0] = 3
    >>> y
    array([3, 3, 2])
    >>> x
    array([1, 2, 3])

Edit:

But a view can be based on a slice that includes a skip parameter, and in fact you even slice in multiple dimensions and it will still be a view. That is worth discussing in the article:

    >>> x = np.array([np.arange(7), np.arange(7)+1]*3)
    >>> y = x[4:1:-2, 1:5:2]
    >>> y
    array([[1, 3],
           [1, 3]])
    >>> y[0,0] = 99
    >>> x
    array([[ 0,  1,  2,  3,  4,  5,  6],
           [ 1,  2,  3,  4,  5,  6,  7],
           [ 0,  1,  2,  3,  4,  5,  6],
           [ 1,  2,  3,  4,  5,  6,  7],
           [ 0, 99,  2,  3,  4,  5,  6],
           [ 1,  2,  3,  4,  5,  6,  7]])
A related fun fact, when slicing several dimensions:

    >>> a = np.arange(9).reshape(3,3) # a matrix
    >>> a[0:3,0:3]          # ranges are treated independently
    array([[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]])
    >>> a[[0,1,2],[0,1,2]]  # but arrays are treated at once
    array([0, 4, 8])
A copy-on-write mechanism triggered by `y[0] = 3` would look the same and pass the test you devised, so you can't eliminate the possibility that it exists.

A better way would be to track memory use. A copy being created by either `y = x[[0, 2, 1]]` or `y[0] = 3` would show as a memory increase.

As an aside, one of my major challenges grokking numpy and pandas is the semantically dense syntax like the above. I know that the layers of bracing have an impact but it's difficult for me to tell where it is applied and/or described.