fulfill an empty dataframe with common index values from another Daframe

2024/10/12 0:34:05

I have a daframe with a series of period 1 month and frequency one second.

The problem the time step between records is not always 1 second.

time                c1  c2
2013-01-01 00:00:01 5   3
2013-01-01 00:00:03 7   2
2013-01-01 00:00:04 1   5
2013-01-01 00:00:05 4   3
2013-01-01 00:00:06 5   6
2013-01-01 00:00:09 4   2
2013-01-01 00:00:10 7   8

Then I want to create an empty dataframe with the same columns and for the whole period corrected. That means with as many records as seconds has a month. This empty dataframe is fulfilled in principle with nan values:

time                c1  c2
2013-01-01 00:00:01 nan nan
2013-01-01 00:00:02 nan nan
2013-01-01 00:00:03 nan nan
2013-01-01 00:00:04 nan nan
2013-01-01 00:00:05 nan nan
2013-01-01 00:00:06 nan nan
2013-01-01 00:00:07 nan nan
2013-01-01 00:00:08 nan nan
2013-01-01 00:00:09 nan nan
2013-01-01 00:00:10 nan nan

Then compare both, and fulfill the empty one, with the common rows with my first dataframe. The non-common should remain with nan values.

time                c1  c2
2013-01-01 00:00:01 5   3
2013-01-01 00:00:02 nan nan
2013-01-01 00:00:03 7   2
2013-01-01 00:00:04 1   5
2013-01-01 00:00:05 4   3
2013-01-01 00:00:06 5   6
2013-01-01 00:00:07 nan nan
2013-01-01 00:00:08 nan nan
2013-01-01 00:00:09 4   2
2013-01-01 00:00:10 7   8

My try:

#Read from a file the first dataframe
#create an empty dataframe 
N=86400 * 31#seconds per month
index=pd.date_range(df1.index[0], periods=N-1, freq='1s')
df2=pd.DataFrame(index=index, columns=df1.columns)

Now I try with merge or concat but without the expected result:

df2.merge(df1, how='outer')
pd.concat([df2,df1], axis=0, join='outer')

I don't think you need a second dataframe. If you call resample without a fill_method, it will store NaNs for the missing periods:

Out[62]: c1   c2
2013-01-01 00:00:01  5.0  3.0
2013-01-01 00:00:02  NaN  NaN
2013-01-01 00:00:03  7.0  2.0
2013-01-01 00:00:04  1.0  5.0
2013-01-01 00:00:05  4.0  3.0
2013-01-01 00:00:06  5.0  6.0
2013-01-01 00:00:07  NaN  NaN
2013-01-01 00:00:08  NaN  NaN
2013-01-01 00:00:09  4.0  2.0
2013-01-01 00:00:10  7.0  8.0

max() here is just an arbitrary method so that it returns a dataframe. You can replace it with mean, min etc. assuming you have no duplicates. If you have duplicates, they will be aggregated by that function.

As Paul H suggested in the comments, you can use df.resample("s").asfreq() without any aggregation. It skips an unnecessary step of aggregation so it is probably more efficient. It will raise an error if you have duplicate values in the index.


