I'm trying to write a function that performs a mathematical operation on an array and returns the result. A simplified example could be:

```
def original_func(A):return A[1:] + A[:-1]
```

For speed-up and to avoid allocating a new output array for each function call, I would like to have the output array as an argument, and alter it in place:

```
def inplace_func(A, out):out[:] = A[1:] + A[:-1]
```

However, when calling these two functions in the following manner,

```
A = numpy.random.rand(1000,1000)
out = numpy.empty((999,1000))C = original_func(A)inplace_func(A, out)
```

the original function seems to be *twice as fast* as the in-place function. How can this be explained? Shouldn't the in-place function be quicker since it doesn't have to allocate memory?