Missing samples of a dataframe in pandas

2024/11/15 6:56:15

My df:

In [163]: df.head()
Out[163]: x-axis    y-axis    z-axis
time   
2017-07-27 06:23:08 -0.107666 -0.068848  0.963623
2017-07-27 06:23:08 -0.105225 -0.070068  0.963867
.....

I set the index as datetime. Since the sampling rate (10 Hz) is not always constant in the dataframe and for some second I have 8 or 9 samples.

  1. I would like to specify the milliseconds on my datatime (06:23:08**.100**, 06:23:08**.200**, etc.)
  2. I also would like to do interpolation of the missing samples.

Some ideas how to do it in pandas?

Answer

First lets create some sample data which maybe resembles your data.

import pandas as pd
from datetime import timedelta
from datetime import datetimebase = datetime.now()
date_list = [base - timedelta(days=x) for x in range(0, 2)]
values = [v for v in range(2)]
df = pd.DataFrame.from_dict({'Date': date_list, 'values': values})df = df.set_index('Date')
dfvalues
Date    
2017-08-18 20:42:08.563878  0
2017-08-17 20:42:08.563878  1

Now we will create another data frame with every 100 milliseconds of datapoint.

min_val = df.index.min()
max_val = df.index.max()all_val = []
while min_val <= max_val:all_val.append(min_val)min_val += timedelta(milliseconds=100)
# len(all_val) 864001 
df_new = pd.DataFrame.from_dict({'Date': all_val})
df_new = df_new.set_index('Date')

lets join both data frame so all missing rows will have index but no values.

final_df = df_new.join(df)
final_dfvalues
Date    
2017-08-17 20:42:08.563878  1.0
2017-08-17 20:42:08.663878  NaN
2017-08-17 20:42:08.763878  NaN
2017-08-17 20:42:08.863878  NaN
2017-08-17 20:42:08.963878  NaN
2017-08-17 20:42:09.063878  NaN
2017-08-17 20:42:09.163878  NaN

Now interpolate data:

df_final.interpolate()values
Date    
2017-08-17 20:42:08.563878  1.000000
2017-08-17 20:42:08.663878  0.999999
2017-08-17 20:42:08.763878  0.999998
2017-08-17 20:42:08.863878  0.999997
2017-08-17 20:42:08.963878  0.999995
2017-08-17 20:42:09.063878  0.999994
2017-08-17 20:42:09.163878  0.999993
2017-08-17 20:42:09.263878  0.999992

Some interpolation strategies: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html

UPDATE: As per the discussion in comments:

say our initial data does not have millisecond information.

df_new_date_without_miliseconds = df_new['Date']
df_new_date_without_miliseconds[0] # Timestamp('2017-08-17 21:45:49')max_value_date = df_new_date_without_miliseconds[0]
max_value_miliseconds = df_new_date_without_miliseconds[0]updated_dates = []
for val in df_new_date_without_miliseconds:if val == max_value_date:val = max_value_miliseconds + timedelta(milliseconds=100)max_value_miliseconds = valelif val > max_value_date:max_value_date = val + timedelta(milliseconds=0)max_value_miliseconds = valupdated_dates.append(val)output:[Timestamp('2017-08-17 21:45:49.100000'),Timestamp('2017-08-17 21:45:49.200000'),Timestamp('2017-08-17 21:45:49.300000'),Timestamp('2017-08-17 21:45:50'),Timestamp('2017-08-17 21:45:50.100000'),

Assign the new values to the DataFrame

df_new['Date'] = updated_dates
https://en.xdnf.cn/q/119449.html

Related Q&A

How to hide a button after clicked in Python

I was wondering how to hide my start button after being clicked so that If the user accidentally was clicker happy they wouldnt hit the button causing more bubbles to appear on screen. Below is a snipp…

Unable to click on QRadioButton after linking it with QtCore.QEventLoop()

Few days back i had situation where i had to check/uncheck QRadioButton in for loop. Here is the link Waiting in for loop until QRadioButton get checked everytime? After implementing QEventLoop on thi…

Distance Matrix Haversine

I am working on a data frame that looks like this :lat lon id_zone 0 40.0795 4.338600 1 45.9990 4.829600 2 45.2729 2.882000 3 45.7336 4.850478 4 45.6981 5.…

python google geolocation api using wifi mac

Im trying to use Googles API for geolocation giving wifi data to determine location. This is their intro. And this is my code@author: Keith """import requestspayload = {"c…

Python requests and variable payload

Reticulated members,I am attempting to use a GET method that is supported against the endpoint. However, I am using python and wanting to pass the user raw_input that is assigned to a variable:uid = ra…

How should I read and write a configuration file for TkInter?

Ive gathered numbers in a configuration file, and I would like to apply them to buttons. Clicking the button should allow the number to be changed and then re-written to the config file. My current cod…

How to convert this nested dictionary into one single dictionary in Python 3? [duplicate]

This question already has answers here:Convert nested dictionary into a dictionary(2 answers)Flatten nested dictionaries, compressing keys(32 answers)Closed 4 years ago.I have a dictionary like this:a …

How to click unopened tabs where the numbers change

How do I click all the unopened tabs pages where the value changes when you click tabs? (see image below)Take the following script, based off of this question with the following approach:clickMe = wai…

Xlwings / open password protected worksheet in xlsx?

I get an answer how to open a password protected xlsx with xlwings: Xlwings / open password protected xlsx? wb = xlwings.Book("file.xlsx", password="Passw0rd!")But can i also open …

Wrapping an opencv implementaion of an error level analysis algorithm using cython

i have implemented an error level analysis algorithm using c++(opencv version 2.4) and i want to build a python wrapper for it using cython. I have read some part of the documentation of cython for c++…