I have pandas dataframe like this:
d = {'dollar_amount': ['200.25', '350.00', '120.00', '400.50', '1231.25', '700.00', '350.00', '200.25', '2340.00'], 'date': ['22-01-2010','22-01-2010','23-01-2010','15-02-2010','27-02-2010','07-03-2010','14-01-2011','09-10-2011','28-07-2012']}
df = pd.DataFrame(data=d)df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
pd.options.display.float_format = '{:,.4f}'.format
df['dollar_amount'] = df['dollar_amount'].astype(float)
dfdate dollar_amount
0 22-01-2010 200.25
1 22-01-2010 350.00
2 23-01-2010 120.00
3 15-02-2010 400.50
4 27-02-2010 1231.25
5 07-03-2010 700.00
6 14-01-2011 350.00
7 09-10-2011 200.25
8 11-11-2011 2340.00
9 12-12-2011 144.50
10 12-09-2012 760.00
11 22-10-2012 255.00
12 28-07-2012 650.00
I want to sum amounts for each day in each year. So I am dividing the years like this:
date1 = df[(df['date'] >= '2010-01-01') & (df['date'] < '2011-01-01')]
date2 = df[(df['date'] >= '2011-01-01') & (df['date'] < '2012-01-01')]
date3 = df[(df['date'] >= '2012-01-01') & (df['date'] < '2013-01-01')]
So now I have 3 dataframes with dates from the year 2010 in date1
dataframe,
dates from the year 2011 in date2
and dates from 2012 in date3
.
Lets look at date1
:
print type(date1)
date1<class 'pandas.core.frame.DataFrame'>date dollar_amount
0 2010-01-22 200.2500
1 2010-01-22 350.0000
2 2010-01-23 120.0000
3 2010-02-15 400.5000
4 2010-02-27 1,231.2500
5 2010-03-07 700.0000
Next I am summing up the amounts date wise, so I am grouping on date using this:
date1 = date1.groupby('date', as_index=False).sum()
date1 = date1[['date','dollar_amount']].sort_values(by=['date'],
ascending=True)date2 = date2.groupby('date', as_index=False).sum()
date2 = date2[['date','dollar_amount']].sort_values(by=['date'],
ascending=True)date3 = date3.groupby('date', as_index=False).sum()
date3 = date3[['date','dollar_amount']].sort_values(by=['date'],
ascending=True)
Let's look at the dateframe date1
now:
date1date dollar_amount
0 2010-01-22 550.2500
1 2010-01-23 120.0000
2 2010-02-15 400.5000
3 2010-02-27 1,231.2500
4 2010-03-07 700.0000
This is just sorting them in ascending date wise order:
date1 = date1[['date','dollar_amount']].sort_values(by=['date'],
ascending=True)
Now I have got the date wise sum of dollarAmounts for each year in different dataframes. Then I am plotting traces for each year. Its working fine and fulfilling the task. But this code is very redundant and I am copying the same code and if I have say data from year 2000 to 2017 then I will have to copy and paste the same piece of code 18 times. I think its not very effective way of doing this.
I am sure there must be a better way of doing this but I cant figure out how. Kindly help me. Thanks.