Pandas: Filling data for missing dates

2024/9/16 22:58:59

Let's say I've got the following table:

ProdID  Date        Val1 Val2 Val3
Prod1   4/1/2019    1    3    4
Prod1   4/3/2019    2    3    54
Prod1   4/4/2019    3    4    54
Prod2   4/1/2019    1    3    3
Prod2   4/2/2019    1    3    4
Prod2   4/3/2019    2    4    4
Prod2   4/4/2019    2    5    3

Prod2 entries are populated correctly as we've got the data from 4/1/2019 to 4/4/2019.

Prod1 has 1 missing date - 4/2/2019.

I would like to find missing dates for all ProdIDs and fill in Val1-3 with data copied from the last of previous entry. For instance, I would like to copy data from 4/1/2019 to 4/2/2019

ProdID  Date        Val1 Val2 Val3
Prod1   4/1/2019    1    3    4
Prod1   4/2/2019    1    3    4
Prod1   4/3/2019    2    3    54
Prod1   4/4/2019    3    4    54
Prod2   4/1/2019    1    3    3
Prod2   4/2/2019    1    3    4
Prod2   4/3/2019    2    4    4
Prod2   4/4/2019    2    5    3
Answer

First convert column to datetimes by to_datetime, then create DatetimeIndex by DataFrame.set_index and call GroupBy.apply with DataFrame.asfreq - there is also possible specify method for forward or back filling missing values:

df['Date'] = pd.to_datetime(df['Date'])df1 = (df.set_index('Date').groupby('ProdID').apply(lambda x: x.asfreq('D', method='ffill')).reset_index(level=0, drop=True).reset_index().reindex(df.columns, axis=1))print (df1)ProdID       Date  Val1  Val2  Val3
0  Prod1 2019-04-01     1     3     4
1  Prod1 2019-04-02     1     3     4
2  Prod1 2019-04-03     2     3    54
3  Prod1 2019-04-04     3     4    54
4  Prod2 2019-04-01     1     3     3
5  Prod2 2019-04-02     1     3     4
6  Prod2 2019-04-03     2     4     4
7  Prod2 2019-04-04     2     5     3

Another solution is create all combinations of product and datetimes by product and DataFrame.merge with left join, last forward filling missing values by ffill:

dates = pd.date_range(start=df['Date'].min(), end=df['Date'].max())
prods = df.ProdID.unique()from  itertools import product
df1 = pd.DataFrame(list(product(prods, dates)), columns=['ProdID', 'Date'])
print (df1)ProdID       Date
0  Prod1 2019-04-01
1  Prod1 2019-04-02
2  Prod1 2019-04-03
3  Prod1 2019-04-04
4  Prod2 2019-04-01
5  Prod2 2019-04-02
6  Prod2 2019-04-03
7  Prod2 2019-04-04df = df1.merge(df, how='left').ffill()
print (df)ProdID       Date  Val1  Val2  Val3
0  Prod1 2019-04-01   1.0   3.0   4.0
1  Prod1 2019-04-02   1.0   3.0   4.0
2  Prod1 2019-04-03   2.0   3.0  54.0
3  Prod1 2019-04-04   3.0   4.0  54.0
4  Prod2 2019-04-01   1.0   3.0   3.0
5  Prod2 2019-04-02   1.0   3.0   4.0
6  Prod2 2019-04-03   2.0   4.0   4.0
7  Prod2 2019-04-04   2.0   5.0   3.0
https://en.xdnf.cn/q/72419.html

Related Q&A

Linear Regression: How to find the distance between the points and the prediction line?

Im looking to find the distance between the points and the prediction line. Ideally I would like the results to be displayed in a new column which contains the distance, called Distance.My Imports:impo…

How to draw a Tetrahedron mesh by matplotlib?

I want to plot a tetrahedron mesh by matplotlib, and the following are a simple tetrahedron mesh: xyz = np.array([[-1,-1,-1],[ 1,-1,-1], [ 1, 1,-1],[-1, 1,-1],[-1,-1, 1],[ 1,-1, 1], [ 1, 1, 1],[-1, 1, …

How to set seaborn jointplot axis to log scale

How to set axis to logarithmic scale in a seaborn jointplot? I cant find any log arguments in seaborn.jointplot Notebook import seaborn as sns import pandas as pddf = pd.read_csv("https://storage…

Convert decision tree directly to png [duplicate]

This question already has answers here:graph.write_pdf("iris.pdf") AttributeError: list object has no attribute write_pdf(10 answers)Closed 7 years ago.I am trying to generate a decision tree…

Python: can I modify a Tuple?

I have a 2 D tuple (Actually I thought, it was a list.. but the error says its a tuple) But anyways.. The tuple is of form: (floatnumber_val, prod_id) now I have a dictionary which contains key-> p…

Saving scatterplot animations

Ive been trying to save an animated scatterplot with matplotlib, and I would prefer that it didnt require totally different code for viewing as an animated figure and for saving a copy. The figure show…

Pandas: Bin dates into 30 minute intervals and calculate averages

I have a Pandas dataframe with two columns which are speed and time.speed date 54.72 1:33:56 49.37 1:33:59 37.03 1:34:03 24.02 7:39:58 28.02 7:40:01 24.04 7:40:04 24.02 7:40:07 25.35 …

Regular expression for UK Mobile Number - Python

I need a regular expression that only validates UK mobile numbers. A UK mobile number can be between 10-14 digits and either starts with 07, or omits the 0 and starts with 447. Importantly, if the user…

Iterate through all the rows in a table using python lxml xpath

This is the source code of the html page I want to extract data from.Webpage: http://gbgfotboll.se/information/?scr=table&ftid=51168 The table is at the bottom of the page <html><tab…

Django: Serializing a list of multiple, chained models

Given two different models, with the same parent base class. Is there any way, using either Django Rest Framework Serializers or serpy, to serialize a chained list containing instances of both the chil…