pandas groupby dates and years and sum up amounts

2024/10/11 12:23:26

I have pandas dataframe like this:

d = {'dollar_amount': ['200.25', '350.00', '120.00', '400.50', '1231.25', '700.00', '350.00', '200.25', '2340.00'], 'date': ['22-01-2010','22-01-2010','23-01-2010','15-02-2010','27-02-2010','07-03-2010','14-01-2011','09-10-2011','28-07-2012']}
df = pd.DataFrame(data=d)df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
pd.options.display.float_format = '{:,.4f}'.format
df['dollar_amount'] = df['dollar_amount'].astype(float)
dfdate        dollar_amount
0   22-01-2010  200.25
1   22-01-2010  350.00
2   23-01-2010  120.00
3   15-02-2010  400.50
4   27-02-2010  1231.25
5   07-03-2010  700.00
6   14-01-2011  350.00
7   09-10-2011  200.25
8   11-11-2011  2340.00
9   12-12-2011  144.50
10  12-09-2012  760.00
11  22-10-2012  255.00
12  28-07-2012  650.00

I want to sum amounts for each day in each year. So I am dividing the years like this:

date1 = df[(df['date'] >= '2010-01-01') & (df['date'] < '2011-01-01')]
date2 = df[(df['date'] >= '2011-01-01') & (df['date'] < '2012-01-01')]
date3 = df[(df['date'] >= '2012-01-01') & (df['date'] < '2013-01-01')]

So now I have 3 dataframes with dates from the year 2010 in date1 dataframe, dates from the year 2011 in date2 and dates from 2012 in date3.

Lets look at date1:

print type(date1)
date1<class 'pandas.core.frame.DataFrame'>date        dollar_amount
0   2010-01-22  200.2500
1   2010-01-22  350.0000
2   2010-01-23  120.0000
3   2010-02-15  400.5000
4   2010-02-27  1,231.2500
5   2010-03-07  700.0000

Next I am summing up the amounts date wise, so I am grouping on date using this:

date1 = date1.groupby('date', as_index=False).sum()
date1 = date1[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)date2 = date2.groupby('date', as_index=False).sum()
date2 = date2[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)date3 = date3.groupby('date', as_index=False).sum()
date3 = date3[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

Let's look at the dateframe date1 now:

date1date        dollar_amount
0   2010-01-22  550.2500
1   2010-01-23  120.0000
2   2010-02-15  400.5000
3   2010-02-27  1,231.2500
4   2010-03-07  700.0000

This is just sorting them in ascending date wise order:

date1 = date1[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

Now I have got the date wise sum of dollarAmounts for each year in different dataframes. Then I am plotting traces for each year. Its working fine and fulfilling the task. But this code is very redundant and I am copying the same code and if I have say data from year 2000 to 2017 then I will have to copy and paste the same piece of code 18 times. I think its not very effective way of doing this.

I am sure there must be a better way of doing this but I cant figure out how. Kindly help me. Thanks.

Answer

I think you can create MultiIndex by years to output:

df1 = df.groupby('date', as_index=False)['dollar_amount'].sum()
df1 = df1.set_index(df['date'].rename('year').dt.year, append=True).swaplevel(0,1)
print (df1)date  dollar_amount
year                            
2010 0 2010-01-22       550.25001 2010-01-23       120.00002 2010-02-15       400.50003 2010-02-27     1,231.25004 2010-03-07       700.0000
2011 5 2011-01-14       350.00006 2011-10-09       200.2500
2012 7 2012-07-28     2,340.0000print (df1.loc[2010])date  dollar_amount
0 2010-01-22       550.2500
1 2010-01-23       120.0000
2 2010-02-15       400.5000
3 2010-02-27     1,231.2500
4 2010-03-07       700.0000print (df1.loc[2011])date  dollar_amount
5 2011-01-14       350.0000
6 2011-10-09       200.2500print (df1.loc[2012])date  dollar_amount
7 2012-07-28     2,340.0000

If want create dictionary of DataFrames:

d = dict(tuple(df.groupby(df['date'].dt.year)))
print (d)print (d[2010])date  dollar_amount
0 2010-01-22       550.2500
1 2010-01-23       120.0000
2 2010-02-15       400.5000
3 2010-02-27     1,231.2500
4 2010-03-07       700.0000print (d[2011])date  dollar_amount
5 2011-01-14       350.0000
6 2011-10-09       200.2500print (d[2012])date  dollar_amount
7 2012-07-28     2,340.0000
https://en.xdnf.cn/q/69778.html

Related Q&A

Is Python on every GNU/Linux distribution?

I would like to know if is Python on every G/L distribution preinstalled or not. And why is it so popular on GNU/Linux and not so much on Windows?

Installing QuantLib in Anaconda on the Spyder Editor (Windows)

How do I install the QuantLib Package in Anaconda. I have tried the following code;import QuantLib as qlbut I am getting the following result;ModuleNotFoundError: No module named QuantLibCan anyone ass…

get rows with empty dates pandas python

it looks like this:Dates N-D unit 0 1/1/2016 Q1 UD 1 Q2 UD 2 Q3 UD 3 2/1/2016 Q4 UD 4 5/1/2016 Q5 UD 5 Q6 UDI want to filter out the empty Dates row…

Python: Gridsearch Without Machine Learning?

I want to optimize an algorithm that has several variable parametersas input.For machine learning tasks, Sklearn offers the optimization of hyperparameters with the gridsearch functionality.Is there a …

Pandas division (.div) with multiindex

I have something similar to thisdf = pd.DataFrame(np.random.randint(2, 10, size = (5, 2))) df.index = pd.MultiIndex.from_tuples([(1, A), (2, A), (4, B), (5, B), (8, B)]) df.index.names = [foo, bar] df.…

Add a delay to a specific scrapy Request

Is it possible to delay the retry of a particular scrapy Request. I have a middleware which needs to defer the request of a page until a later time. I know how to do the basic deferal (end of queue), a…

importing without executing the class - python

my problem is about i have a file that contain class and inside this class there is bunch of code will be executed so whenever i import that file it will executed ! without creating an object of the…

If a command line program is unsure of stdouts encoding, what encoding should it output?

I have a command line program written in Python, and when I pipe it through another program on the command line, sys.stdout.encoding is None. This makes sense, I suppose -- the output could be another…

How to generate JSON-API data attribute vs results attribute in Django Rest Framework JSON API?

I have a django 1.9.2 project using Django Rest Framework JSON API:https://github.com/django-json-api/django-rest-framework-json-api:My viewset looks like this:class QuestionViewSet(viewsets.ReadOnlyMo…

How connect my GoPro Hero 4 camera live stream to openCV using Python?

I m having troubles trying to capture a live stream from my new GoPro Hero 4 camera and do some image processing on it using openCV.Here is my trial (nothing shows up on the created windowimport cv2 im…