Pandas Grouper by weekday?

2024/10/6 20:34:26

I have a pandas dataframe where the index is the date, from year 2007 to 2017.

I'd like to calculate the mean of each weekday for each year. I am able to group by year:

groups = df.groupby(TimeGrouper('A'))
years = DataFrame()
for name, group in groups:years[name.year] = group.values

This is the way I create a new dataframe (years) where in each column I obtain each year of the time series. If I want to see the statistics of each years (for example, the mean):

print(years.mean())

But now I would like to separate each day of the week for each year, in order to obtain the mean of each weekday for all of then.

The only thing I know is:

year=df[(df.index.year==2007)]day_week=df[(df.index.weekday==2)]

The problem with this is that I have to change 7 times the day of the week, and then repeat this for 11 years (my time series begins on 2007 and ends on 2017), so I must do it 77 times!

Is there a way to group time by years and weekday in order to make this faster?

Answer

It seems you need groupby by DatetimeIndex.year with DatetimeIndex.weekday:

rng = pd.date_range('2017-04-03', periods=10, freq='10M')
df = pd.DataFrame({'a': range(10)}, index=rng)  
print (df)a
2017-04-30  0
2018-02-28  1
2018-12-31  2
2019-10-31  3
2020-08-31  4
2021-06-30  5
2022-04-30  6
2023-02-28  7
2023-12-31  8
2024-10-31  9df1 = df.groupby([df.index.year, df.index.weekday]).mean()
print (df1)a
2017 6  0
2018 0  22  1
2019 3  3
2020 0  4
2021 2  5
2022 5  6
2023 1  76  8
2024 3  9

df1 = df.groupby([df.index.year, df.index.weekday]).mean().reset_index()
df1 = df1.rename(columns={'level_0':'years','level_1':'weekdays'})
print (df1)years  weekdays  a
0   2017         6  0
1   2018         0  2
2   2018         2  1
3   2019         3  3
4   2020         0  4
5   2021         2  5
6   2022         5  6
7   2023         1  7
8   2023         6  8
9   2024         3  9
https://en.xdnf.cn/q/70322.html

Related Q&A

Can I move the pygame game window around the screen (pygame)

In the game Im making, Im trying to move the window around the screen for a mini game (dont ask) and Ive tried what I saw own threads and only found 1x = 100 y = 0 import os os.environ[SDL_VIDEO_WINDOW…

mocking a function within a class method

I want to mock a function which is called within a class method while testing the class method in a Django project. Consider the following structure: app/utils.py def func():...return resp # outcome i…

After resizing an image with cv2, how to get the new bounding box coordinate

I have an image of size 720 x 1280, and I can resize it to 256 x 256 like thisimport cv2 img = cv2.imread(sample_img.jpg) img_small = cv2.resize(img, (256, 256), interpolation=cv2.INTER_CUBIC)Say I hav…

convert a tsv file to xls/xlsx using python

I want to convert a file in tsv format to xls/xlsx..I tried usingos.rename("sample.tsv","sample.xlsx")But the file getting converted is corrupted. Is there any other method of doing…

How do you edit cells in a sparse matrix using scipy?

Im trying to manipulate some data in a sparse matrix. Once Ive created one, how do I add / alter / update values in it? This seems very basic, but I cant find it in the documentation for the sparse ma…

AttributeError: DataFrame object has no attribute _data

Azure Databricks execution error while parallelizing on pandas dataframe. The code is able to create RDD but breaks at the time of performing .collect() setup: import pandas as pd # initialize list of …

Python: Problem with overloaded constructors

WARNING: I have been learning Python for all of 10 minutes so apologies for any stupid questions!I have written the following code, however I get the following exception: Message FileName Li…

Validate inlines before saving model

Lets say I have these two models:class Distribution(models.Model):name = models.CharField(max_length=32)class Component(models.Model):distribution = models.ForeignKey(Distribution)percentage = models.I…

Grouping and comparing groups using pandas

I have data that looks like:Identifier Category1 Category2 Category3 Category4 Category5 1000 foo bat 678 a.x ld 1000 foo bat 78 l.o …

Transform a 3-column dataframe into a matrix

I have a dataframe df, for example:A = [["John", "Sunday", 6], ["John", "Monday", 3], ["John", "Tuesday", 2], ["Mary", "Sunday…