I want to understand the NumPy behavior.
When I try to get the reference of an inner array of a NumPy array, and then compare it to the object itself, I get as returned value False
.
Here is the example:
In [198]: x = np.array([[1,2,3], [4,5,6]])
In [201]: x0 = x[0]
In [202]: x0 is x[0]
Out[202]: False
While on the other hand, with Python native objects, the returned is True
.
In [205]: c = [[1,2,3],[1]]
In [206]: c0 = c[0]
In [207]: c0 is c[0]
Out[207]: True
My question, is that the intended behavior of NumPy? If so, what should I do if I want to create a reference of inner objects of NumPy arrays.
2d slicing
When I first wrote this I constructed and indexed a 1d array. But the OP is working with a 2d array, so x[0]
is a 'row', a slice of the original.
In [81]: arr = np.array([[1,2,3], [4,5,6]])
In [82]: arr.__array_interface__['data']
Out[82]: (181595128, False)In [83]: x0 = arr[0,:]
In [84]: x0.__array_interface__['data']
Out[84]: (181595128, False) # same databuffer pointer
In [85]: id(x0)
Out[85]: 2886887088
In [86]: x1 = arr[0,:] # another slice, different id
In [87]: x1.__array_interface__['data']
Out[87]: (181595128, False)
In [88]: id(x1)
Out[88]: 2886888888
What I wrote earlier about slices still applies. Indexing an individual elements, as with arr[0,0]
works the same as with a 1d array.
This 2d arr has the same databuffer as the 1d arr.ravel()
; the shape and strides are different. And the distinction between view
, copy
and item
still applies.
A common way of implementing 2d arrays in C is to have an array of pointers to other arrays. numpy
takes a different, strided
approach, with just one flat array of data, and usesshape
and strides
parameters to implement the transversal. So a subarray requires its own shape
and strides
as well as a pointer to the shared databuffer.
1d array indexing
I'll try to illustrate what is going on when you index an array:
In [51]: arr = np.arange(4)
The array is an object with various attributes such as shape, and a data buffer. The buffer stores the data as bytes (in a C array), not as Python numeric objects. You can see information on the array with:
In [52]: np.info(arr)
class: ndarray
shape: (4,)
strides: (4,)
itemsize: 4
aligned: True
contiguous: True
fortran: True
data pointer: 0xa84f8d8
byteorder: little
byteswap: False
type: int32
or
In [53]: arr.__array_interface__
Out[53]:
{'data': (176486616, False),'descr': [('', '<i4')],'shape': (4,),'strides': None,'typestr': '<i4','version': 3}
One has the data pointer in hex, the other decimal. We usually don't reference it directly.
If I index an element, I get a new object:
In [54]: x1 = arr[1]
In [55]: type(x1)
Out[55]: numpy.int32
In [56]: x1.__array_interface__
Out[56]:
{'__ref': array(1),'data': (181158400, False),
....}
In [57]: id(x1)
Out[57]: 2946170352
It has some properties of an array, but not all. For example you can't assign to it. Notice also that its 'data` value is totally different.
Make another selection from the same place - different id and different data:
In [58]: x2 = arr[1]
In [59]: id(x2)
Out[59]: 2946170336
In [60]: x2.__array_interface__['data']
Out[60]: (181143288, False)
Also if I change the array at this point, it does not affect the earlier selections:
In [61]: arr[1] = 10
In [62]: arr
Out[62]: array([ 0, 10, 2, 3])
In [63]: x1
Out[63]: 1
x1
and x2
don't have the same id
, and thus won't match with is
, and they don't use the arr
data buffer either. There's no record that either variable was derived from arr
.
With slicing
it is possible get a view
of the original array,
In [64]: y = arr[1:2]
In [65]: y.__array_interface__
Out[65]:
{'data': (176486620, False),'descr': [('', '<i4')],'shape': (1,),....}
In [66]: y
Out[66]: array([10])
In [67]: y[0]=4
In [68]: arr
Out[68]: array([0, 4, 2, 3])
In [69]: x1
Out[69]: 1
It's data pointer is 4 bytes larger than arr
- that is, it points to the same buffer, just a different spot. And changing y
does change arr
(but not the independent x1
).
I could even make a 0d view of this item
In [71]: z = y.reshape(())
In [72]: z
Out[72]: array(4)
In [73]: z[...]=0
In [74]: arr
Out[74]: array([0, 0, 2, 3])
In Python code we normally don't work with objects like this. When we use the c-api
or cython
is it possible to access the data buffer directly. nditer
is an iteration mechanism that works with 0d objects like this (either in Python or the c-api). In cython
typed memoryviews
are particularly useful for low level access.
http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
https://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#c.NpyIter
elementwise ==
In response to comment, Comparing NumPy object references
np.array([1]) == np.array([2]) will return array([False], dtype=bool)
==
is defined for arrays as an elementwise operation. It compares the values of the respective elements and returns a matching boolean array.
If such a comparison needs to be used in a scalar context (such as an if
) it needs to be reduced to a single value, as with np.all
or np.any
.
The is
test compares object id's (not just for numpy objects). It has limited value in practical coding. I used it most often in expressions like is None
, where None
is an object with a unique id, and which does not play nicely with equality tests.