Pandas reindex and interpolate time series efficiently (reindex drops data)

2024/10/15 17:23:16

Suppose I wish to re-index, with linear interpolation, a time series to a pre-defined index, where none of the index values are shared between old and new index. For example

# index is all precise timestamps e.g. 2018-10-08 05:23:07
series = pandas.Series(data,index) # I want rounded date-times
desired_index = pandas.date_range("2010-10-08",periods=10,freq="30min") 

Tutorials/API suggest the way to do this is to reindex then fill NaN values using interpolate. But, as there is no overlap of datetimes between the old and new index, reindex outputs all NaN:

# The following outputs all NaN as no date times match old to new index
series.reindex(desired_index)

I do not want to fill nearest values during reindex as that will lose precision, so I came up with the following; concatenate the reindexed series with the original before interpolating:

pandas.concat([series,series.reindex(desired_index)]).sort_index().interpolate(method="linear")

This seems very inefficient, concatenating and then sorting the two series. Is there a better way?

Answer

The only (simple) way I can see of doing this is to use resample to upsample to your time resolution (say 1 second), then reindex.

Get an example DataFrame:

import numpy as np
import pandas as pdnp.random.seed(2)df = (pd.DataFrame().assign(SampleTime=pd.date_range(start='2018-10-01', end='2018-10-08', freq='30T')+ pd.to_timedelta(np.random.randint(-5, 5, size=337), unit='s'),Value=np.random.randn(337)).set_index(['SampleTime'])
)

Let's see what the data looks like:

df.head()Value
SampleTime
2018-10-01 00:00:03     0.033171
2018-10-01 00:30:03     0.481966
2018-10-01 01:00:01     -0.495496

Get the desired index:

desired_index = pd.date_range('2018-10-01', periods=10, freq='30T')

Now, reindex the data with the union of the desired and existing indices, interpolate based on the time, and reindex again using only the desired index:

(df.reindex(df.index.union(desired_index)).interpolate(method='time').reindex(desired_index)
)Value
2018-10-01 00:00:00     NaN
2018-10-01 00:30:00     0.481218
2018-10-01 01:00:00     -0.494952
2018-10-01 01:30:00     -0.103270

As you can see, you still have an issue with the first timestamp because it's outside the range of the original index; there are number of ways to deal with this (pad, for example).

https://en.xdnf.cn/q/69256.html

Related Q&A

How do you set the box width in a plotly box in python?

I currently have the following;y = time_h time_box = Box(y=y,name=Time (hours),boxmean=True,marker=Marker(color=green),boxpoints=all,jitter=0.5,pointpos=-2.0 ) layout = Layout(title=Time Box, ) fig = F…

how do you install django older version using easy_install?

I just broke my environment because of django 1.3. None of my sites are able to run. So, i decided to use virtualenv to set virtual environment with different python version as well as django.But, seem…

Whats difference between findall() and iterfind() of xml.etree.ElementTree

I write a program using just like belowfrom xml.etree.ElementTree import ETxmlroot = ET.fromstring([my xml content])for element in xmlroot.iterfind(".//mytag"):do some thingit works fine on m…

How to convert string dataframe column to datetime as format with year and week?

Sample Data:Week Price 2011-31 1.58 2011-32 1.9 2011-33 1.9 2011-34 1.9I have a dataframe like above and I wanna convert Week column type from string to datetime.My Code:data[Date_Time…

Tensorflow - ValueError: Shape must be rank 1 but is rank 0 for ParseExample/ParseExample

I have a .tfrecords file of the Ubuntu Dialog Corpus. I am trying to read in the whole dataset so that I can split the contexts and utterances into batches. Using tf.parse_single_example I was able to …

Navigating Multi-Dimensional JSON arrays in Python

Im trying to figure out how to query a JSON array in Python. Could someone show me how to do a simple search and print through a fairly complex array please?The example Im using is here: http://eu.bat…

Numpy, apply a list of functions along array dimension

I have a list of functions of the type:func_list = [lambda x: function1(input),lambda x: function2(input),lambda x: function3(input),lambda x: x]and an array of shape [4, 200, 200, 1] (a batch of image…

Database first Django models

In ASP.NET there is entity framework or something called "database first," where entities are generated from an existing database. Is there something similar for Django? I usually work with …

How to use pythons Structural Pattern Matching to test built in types?

Im trying to use SPM to determine if a certain type is an int or an str. The following code: from typing import Typedef main(type_to_match: Type):match type_to_match:case str():print("This is a St…

Importing app when using Alembic raises ImportError

I am trying to study how to use alembic in flask, I want to import a method in flask app:tree . . ├── README.md ├── alembic │ ├── README │ ├── env.py │ ├── env.pyc │ ├── s…