pandas DataFrame resample from irregular timeseries index

2024/9/8 8:41:09

I want to resample a DataFrame to every five seconds, where the time stamps of the original data are irregular. Apologies if this looks like a duplicate question, but I have issues with the interpolation lining up to the timestamps of the data, which is why I include my DataFrame in this question. The graph in this answer shows my desired results, but I cannot use the traces package suggested there. I use pandas 0.19.0.

Consider the following climb path of an aircraft (as dict on pastebin):

    Altitude        Time
1       0.00     0.00000
2    1000.00    16.45350
3    2000.00    33.19584
4    3000.00    50.25330
5    4000.00    67.64580
6    5000.00    85.38720
7    6000.00   103.56720
8    7000.00   122.29260
9    8000.00   141.61440
10   9000.00   161.59140
11   9999.67   182.27940
12  10000.30   182.33940
13  10000.30   199.76880
14  10000.30   199.82880
15  11000.00   221.67660
16  12000.00   244.36260
17  13000.00   267.93900
18  14000.00   292.46940
19  15000.00   318.01080
20  16000.00   344.36820
21  17000.00   371.32200
22  18000.00   398.91420
23  19000.00   427.19100
24  20000.00   456.24900
25  21000.00   486.38940
26  22000.00   517.91640
27  23000.00   550.96140
28  24000.00   585.65460
29  25000.00   622.12800
30  26000.00   660.35400
31  27000.00   700.37400
32  28000.00   742.39200
33  29000.00   786.57600
34  30000.00   833.13000
35  31000.00   882.09000
36  32000.00   933.46200
37  33000.00   987.40800
38  34000.00  1044.06000
39  35000.00  1103.85000
40  36000.00  1167.52200
41  36088.90  1173.39000
42  36089.60  1173.45000
43  36671.70  1216.60200
44  36672.40  1216.66200
45  38000.00  1295.80200
46  39000.00  1368.45000
47  40000.00  1458.00000
48  41000.00  1574.08200
49  42000.00  1730.97000
50  42231.00  1775.19600

Tried solutions

First, I have tried resampling while keeping the original index intact, as shown in this question, so I could then linearly interpolate, but I found no method of interpolation that produces correct results (note the original time column that only matches at 16.45s):

df = df.set_index(pd.to_datetime(df['Time'], unit='s'), drop=False)
resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate().iloc[:6]Time  Altitude
1970-01-01 00:00:00.000000   0.000000       0.0
1970-01-01 00:00:05.000000   4.113375     250.0
1970-01-01 00:00:10.000000   8.226750     500.0
1970-01-01 00:00:15.000000  12.340125     750.0
1970-01-01 00:00:16.453500  16.453500    1000.0
1970-01-01 00:00:20.000000  20.639085    1250.0

Second, I tried resampling without keeping the original index, first down to 1s and then up to 5s, as shown in this answer, but the interpolation values do not line up at the end of the data, nor do the altitude values (1000ft should be between 15 and 20 seconds). Just resampling to 1s already produces wrong results.

df.resample('1s').interpolate(method='linear').resample('5s').asfreq()Time      Altitude
1970-01-01 00:00:00     0.0      0.000000
1970-01-01 00:00:05     5.0    137.174211
1970-01-01 00:00:10    10.0    274.348422
1970-01-01 00:00:15    15.0    411.522634
1970-01-01 00:00:20    20.0    548.696845
1970-01-01 00:00:25    25.0    685.871056
1970-01-01 00:00:30    30.0    823.045267
1970-01-01 00:00:35    35.0    960.219479
1970-01-01 00:00:40    40.0   1097.393690
1970-01-01 00:00:45    45.0   1234.567901
1970-01-01 00:00:50    50.0   1371.742112
1970-01-01 00:00:55    55.0   1508.916324
1970-01-01 00:01:00    60.0   1646.090535
1970-01-01 00:01:05    65.0   1783.264746
1970-01-01 00:01:10    70.0   1920.438957
1970-01-01 00:01:15    75.0   2057.613169
1970-01-01 00:01:20    80.0   2194.787380
1970-01-01 00:01:25    85.0   2331.961591
1970-01-01 00:01:30    90.0   2469.135802
1970-01-01 00:01:35    95.0   2606.310014
1970-01-01 00:01:40   100.0   2743.484225
1970-01-01 00:01:45   105.0   2880.658436
1970-01-01 00:01:50   110.0   3017.832647
1970-01-01 00:01:55   115.0   3155.006859
1970-01-01 00:02:00   120.0   3292.181070
1970-01-01 00:02:05   125.0   3429.355281
1970-01-01 00:02:10   130.0   3566.529492
1970-01-01 00:02:15   135.0   3703.703704
1970-01-01 00:02:20   140.0   3840.877915
1970-01-01 00:02:25   145.0   3978.052126
...                     ...           ...
1970-01-01 00:27:10  1458.0  40000.000000
1970-01-01 00:27:15  1458.0  40000.000000
1970-01-01 00:27:20  1458.0  40000.000000
1970-01-01 00:27:25  1458.0  40000.000000
1970-01-01 00:27:30  1458.0  40000.000000
1970-01-01 00:27:35  1458.0  40000.000000
1970-01-01 00:27:40  1458.0  40000.000000
1970-01-01 00:27:45  1458.0  40000.000000
1970-01-01 00:27:50  1458.0  40000.000000
1970-01-01 00:27:55  1458.0  40000.000000
1970-01-01 00:28:00  1458.0  40000.000000
1970-01-01 00:28:05  1458.0  40000.000000
1970-01-01 00:28:10  1458.0  40000.000000
1970-01-01 00:28:15  1458.0  40000.000000
1970-01-01 00:28:20  1458.0  40000.000000
1970-01-01 00:28:25  1458.0  40000.000000
1970-01-01 00:28:30  1458.0  40000.000000
1970-01-01 00:28:35  1458.0  40000.000000
1970-01-01 00:28:40  1458.0  40000.000000
1970-01-01 00:28:45  1458.0  40000.000000
1970-01-01 00:28:50  1458.0  40000.000000
1970-01-01 00:28:55  1458.0  40000.000000
1970-01-01 00:29:00  1458.0  40000.000000
1970-01-01 00:29:05  1458.0  40000.000000
1970-01-01 00:29:10  1458.0  40000.000000
1970-01-01 00:29:15  1458.0  40000.000000
1970-01-01 00:29:20  1458.0  40000.000000
1970-01-01 00:29:25  1458.0  40000.000000
1970-01-01 00:29:30  1458.0  40000.000000
1970-01-01 00:29:35  1458.0  40000.000000

The Question

How can I go about resampling the original data to 5s while performing a correct interpolation? Am I just using the wrong interpolation method?

Answer

After some help from @Martin Schmelzer (thanks!) I found the first suggested method from the question to be working, when applying time as the method parameter for pandas' interpolation method:

resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate('time').iloc[:6]Altitude     Time
1970-01-01 00:00:00.000000     0.000000   0.0000
1970-01-01 00:00:05.000000   303.886711   5.0000
1970-01-01 00:00:10.000000   607.773422  10.0000
1970-01-01 00:00:15.000000   911.660133  15.0000
1970-01-01 00:00:16.453500  1000.000000  16.4535
1970-01-01 00:00:20.000000  1211.828215  20.0000

I can then resample this to 5s or whatever and the results are exact.

df.combine_first(dummy_frame).interpolate('time').resample('5s').asfreq().head()Altitude  Time
1970-01-01 00:00:00     0.000000   0.0
1970-01-01 00:00:05   303.886711   5.0
1970-01-01 00:00:10   607.773422  10.0
1970-01-01 00:00:15   911.660133  15.0
1970-01-01 00:00:20  1211.828215  20.0

So in the end it turns out I was just using the wrong interpolation method after all.

https://en.xdnf.cn/q/73088.html

Related Q&A

django makemigrations to rename field without user input

I have a model with CharField named oldName. I want to rename the field to newName. When I run python manage.py makemigrations, I get a confirmation request "Did you rename model.oldName to model.…

Global Python packages in Sublime Text plugin development

1. SummaryI dont find, how Sublime Text plugins developer can use Sublime Text find global Python packages, not Python packages of Sublime Text directory.Sublime Text use own Python environment, not Py…

Can I use pip install to install a module for another users?

Im wish to install Numpy for the www-data user, but I can not log into this user using login. How can I make www-data make us of the Numpy module?To clarify. Numpy is available for root, and for my de…

Set dynamic node shape in network with matplotlib

First time poster here, so please be gentle. :)Im trying to graph a network of characters of different types in Networkx and want to set different node shapes for each type. For example, Id like chara…

How can I strip comments and doc strings from python source code? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

How to install scipy misc package

I have installed (actually reinstalled) scipy:10_x86_64.whl (19.8MB): 19.8MB downloaded Installing collected packages: scipy Successfully installed scipyBut the misc subpackage is apparently not includ…

How to color surface with stronger contrast

In Matlab, I am trying to plot a function on 2-dim Euclidean space with following codes=.05; x=[-2:s:2+s]; y=[-1:s:3+s]; [X,Y]=meshgrid(x,y); Z=(1.-X).^2 + 100.*(Y-X.*X).^2; surf(X,Y,Z) colormap jetHer…

How to know that the interpreter is Jython or CPython in the code? [duplicate]

This question already has answers here:Can I detect if my code is running on cPython or Jython?(5 answers)Closed 9 years ago.Is there a way to detect that the interpreter that executes the code is Jyt…

Regular expression - replace all spaces in beginning of line with periods

I dont care if I achieve this through vim, sed, awk, python etc. I tried in all, could not get it done.For an input like this:top f1 f2 f3sub1 f1 f2 f3sub2 f1 f2 …

Writing append only gzipped log files in Python

I am building a service where I log plain text format logs from several sources (one file per source). I do not intend to rotate these logs as they must be around forever.To make these forever around f…