Calculate moving average in numpy array with NaNs

2024/9/20 11:45:00

I am trying to calculate the moving average in a large numpy array that contains NaNs. Currently I am using:

import numpy as npdef moving_average(a,n=5):ret = np.cumsum(a,dtype=float)ret[n:] = ret[n:]-ret[:-n]return ret[-1:]/n

When calculating with a masked array:

x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx).filled(np.nan)print y>>> array([3.8,3.8,3.6,nan,nan,nan,2,2.4,nan,nan,nan,2.8,2.6])

The result I am looking for (below) should ideally have NaNs only in the place where the original array, x, had NaNs and the averaging should be done over the number of non-NaN elements in the grouping (I need some way to change the size of n in the function.)

y = array([4.75,4.75,nan,4.4,3.75,2.33,3.33,4,nan,nan,3,3.5,nan,3.25,4,4.5,3])

I could loop over the entire array and check index by index but the array I am using is very large and that would take a long time. Is there a numpythonic way to do this?

Answer

Pandas has a lot of really nice functionality with this. For example:

x = np.array([np.nan, np.nan, 3, 3, 3, np.nan, 5, 7, 7])# requires three valid values in a row or the resulting value is nullprint(pd.Series(x).rolling(3).mean())#output
nan,nan,nan, nan, 3, nan, nan, nan, 6.333# only requires 2 valid values out of three for size=3 windowprint(pd.Series(x).rolling(3, min_periods=2).mean())#output
nan, nan, nan, 3, 3, 3, 4, 6, 6.3333

You can play around with the windows/min_periods and consider filling-in nulls all in one chained line of code.

https://en.xdnf.cn/q/72508.html

Related Q&A

Python: numpy.insert NaN value

Im trying to insert NaN values to specific indices of a numpy array. I keep getting this error:TypeError: Cannot cast array data from dtype(float64) to dtype(int64) according to the rule safeWhen tryin…

Identify external workbook links using openpyxl

I am trying to identify all cells that contain external workbook references, using openpyxl in Python 3.4. But I am failing. My first try consisted of:def find_external_value(cell): # identifies an e…

3D-Stacked 2D histograms

I have a bunch of 2D histograms (square 2D numpy arrays) that I want to stack in 3D like so:(Image from: Cardenas, Alfredo E., et al. "Unassisted transport of N-acetyl-L-tryptophanamide through me…

Python and mySQLdb error: OperationalError: (1054, Unknown column in where clause)

Hey all, Im getting an error OperationalError: (1054, "Unknown column XX in where clause")Where XX is the value of CLASS in the following codeconn = MySQLdb.connect(host = "localhost&quo…

Best Python GIS library? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, argum…

Build a class with an attribute in one line

How do I write a one-liner for the following? class MyClass(): content = {} obj = MyClass()

Python Imports, Paths, Directories Modules

Let me start by saying Ive done extensive research over the course of the past week and have not yet found actual answers to these questions - just some fuzzy answers that dont really explain what is g…

Finding location in code for numpy RuntimeWarning

I am getting warnings like these when running numpy on reasonably large pipeline. RuntimeWarning: invalid value encountered in true_divideRuntimeWarning: invalid value encountered in greaterHow do I fi…

Django, Angular, DRF: Authentication to Django backend vs. API

Im building an app with a Django backend, Angular frontend, and a REST API using Django REST Framework for Angular to consume. When I was still working out backend stuff with a vanilla frontend, I used…

Django view testing

Im trying to figure out if there is a quick way to test my django view functions form either the python or django shell. How would I go about instantiating and passing in faux HTTPrequest object?