Let's say I have the following data sample:
df = pd.DataFrame({'date':['2011-01-01','2011-01-02','2011-01-03','2011-01-04','2011-01-05','2011-01-06','2011-01-07','2011-01-08','2011-01-09','2011-12-30','2011-12-31'],'revenue':[5,3,2,10,12,2,1,0,6,10,12]})# Let's format the date and add the week number and year
df['date'] = pd.to_datetime(df['date'],format='%Y-%m-%d')
df['week_number'] = df['date'].dt.week
df['year'] = df['date'].dt.yeardfdate revenue week_of_year year
0 2011-01-01 5 52 2011
1 2011-01-02 3 52 2011
2 2011-01-03 2 1 2011
3 2011-01-04 10 1 2011
4 2011-01-05 12 1 2011
5 2011-01-06 2 1 2011
6 2011-01-07 1 1 2011
7 2011-01-08 0 1 2011
8 2011-01-09 6 1 2011
9 2011-12-30 10 52 2011
10 2011-12-31 12 52 2011
I would like to compute the revenue per week, in order to later plot the results, and analyze the time series. The expected output would then be something like that :
week revenue
0 1 8
1 2 33
2 52 22
I first thought of using the week number given by timestamp.week
.
However, I can't figure out how to deal with the ISO week number definition for the week preceeding week number 1. I am a bit confused, since grouping by week_number
would in that case sum both the revenue at the very beginning of the year, and those at the end of the year.