My df:
In [163]: df.head()
Out[163]: x-axis y-axis z-axis
time
2017-07-27 06:23:08 -0.107666 -0.068848 0.963623
2017-07-27 06:23:08 -0.105225 -0.070068 0.963867
.....
I set the index as datetime. Since the sampling rate (10 Hz) is not always constant in the dataframe and for some second I have 8 or 9 samples.
- I would like to specify the milliseconds on my datatime (06:23:08**.100**, 06:23:08**.200**, etc.)
- I also would like to do interpolation of the missing samples.
Some ideas how to do it in pandas?
First lets create some sample data which maybe resembles your data.
import pandas as pd
from datetime import timedelta
from datetime import datetimebase = datetime.now()
date_list = [base - timedelta(days=x) for x in range(0, 2)]
values = [v for v in range(2)]
df = pd.DataFrame.from_dict({'Date': date_list, 'values': values})df = df.set_index('Date')
dfvalues
Date
2017-08-18 20:42:08.563878 0
2017-08-17 20:42:08.563878 1
Now we will create another data frame with every 100 milliseconds of datapoint.
min_val = df.index.min()
max_val = df.index.max()all_val = []
while min_val <= max_val:all_val.append(min_val)min_val += timedelta(milliseconds=100)
# len(all_val) 864001
df_new = pd.DataFrame.from_dict({'Date': all_val})
df_new = df_new.set_index('Date')
lets join both data frame so all missing rows will have index but no values.
final_df = df_new.join(df)
final_dfvalues
Date
2017-08-17 20:42:08.563878 1.0
2017-08-17 20:42:08.663878 NaN
2017-08-17 20:42:08.763878 NaN
2017-08-17 20:42:08.863878 NaN
2017-08-17 20:42:08.963878 NaN
2017-08-17 20:42:09.063878 NaN
2017-08-17 20:42:09.163878 NaN
Now interpolate data:
df_final.interpolate()values
Date
2017-08-17 20:42:08.563878 1.000000
2017-08-17 20:42:08.663878 0.999999
2017-08-17 20:42:08.763878 0.999998
2017-08-17 20:42:08.863878 0.999997
2017-08-17 20:42:08.963878 0.999995
2017-08-17 20:42:09.063878 0.999994
2017-08-17 20:42:09.163878 0.999993
2017-08-17 20:42:09.263878 0.999992
Some interpolation strategies: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html
UPDATE: As per the discussion in comments:
say our initial data does not have millisecond information.
df_new_date_without_miliseconds = df_new['Date']
df_new_date_without_miliseconds[0] # Timestamp('2017-08-17 21:45:49')max_value_date = df_new_date_without_miliseconds[0]
max_value_miliseconds = df_new_date_without_miliseconds[0]updated_dates = []
for val in df_new_date_without_miliseconds:if val == max_value_date:val = max_value_miliseconds + timedelta(milliseconds=100)max_value_miliseconds = valelif val > max_value_date:max_value_date = val + timedelta(milliseconds=0)max_value_miliseconds = valupdated_dates.append(val)output:[Timestamp('2017-08-17 21:45:49.100000'),Timestamp('2017-08-17 21:45:49.200000'),Timestamp('2017-08-17 21:45:49.300000'),Timestamp('2017-08-17 21:45:50'),Timestamp('2017-08-17 21:45:50.100000'),
Assign the new values to the DataFrame
df_new['Date'] = updated_dates