Count unique dates in pandas dataframe

2024/10/12 20:25:25

I have a dataframe of surface weather observations (fzraHrObs) organized by a station identifier code and date. fzraHrObs has several columns of weather data. The station code and date (datetime objects) look like:

usaf      dat
716270    2014-11-23 12:00:002015-12-20 08:00:002015-12-20 09:00:002015-12-21 04:00:002015-12-28 03:00:00
716280    2015-12-19 08:00:002015-12-19 08:00:00

I would like to get a count of the number of unique dates (days) per year for each station - i.e. the number of days of obs per year at each station. In my example above this would give me:

    usaf      Year     Count716270    2014     12015     3716280    2014     02015     1

I've tried using groupby and grouping by station, year, and date: grouped = fzraHrObs['dat'].groupby(fzraHrObs['usaf'], fzraHrObs.dat.dt.year, fzraHrObs.dat.dt.date])

Count, size, nunique, etc. on this just gives me the number of obs on each date, not the number of dates themselves per year. Any suggestions on getting what I want here?

Answer

Could be something like this, group the date by usaf and year and then count the number of unique values:

import pandas as pd
df.dat.apply(lambda dt: dt.date()).groupby([df.usaf, df.dat.apply(lambda dt: dt.year)]).nunique()#   usaf   dat 
# 716270  2014    1
#         2015    3
# 716280  2015    1
# Name: dat, dtype: int64
https://en.xdnf.cn/q/69610.html

Related Q&A

Miniforge / VScode - Python is not installed and virtualenv is not found

I have been stuck on this issue for several days, so any help is greatly appreciated. I recently had to move away from Anaconda (due to their change in the commercial policy) and decided to try Minifo…

How to merge pandas table by regex

I am wondering if there a fast way to merge two pandas tables by the regular expression in python .For example: table A col1 col2 1 apple_3dollars_5 2 apple_2dollar_4 1 o…

Scipy Optimize is only returning x0, only completing one iteration

I am using scipy optimize to get the minimum value on the following function: def randomForest_b(a,b,c,d,e):return abs(rf_diff.predict([[a,b,c,d,e]]))I eventually want to be able to get the optimal val…

Order of sess.run([op1, op2...]) in Tensorflow

I wonder whats the running order of the op list in sess.run(ops_list, ...). for example:for a typical classification scenario: _, loss = sess.run([train_op, loss_op]), if train_op run first,then the lo…

Django form validation: get errors in JSON format

I have this very simple Django formfrom django import formsclass RegistrationForm(forms.Form):Username = forms.CharField()Password = forms.CharField()I manage this manually and dont use the template en…

Django inheritance and polymorphism with proxy models

Im working on a Django project that I did not start and I am facing a problem of inheritance. I have a big model (simplified in the example) called MyModel that is supposed to represents different kind…

L suffix in long integer in Python 3.x

In Python 2.x there was a L suffix after long integer. As Python 3 treats all integers as long integer this has been removed. From Whats New In Python 3.0:The repr() of a long integer doesn’t include …

Custom Colormap

I want to plot a heatmap with a custom colormap similar to this one, although not exactly.Id like to have a colormap that goes like this. In the interval [-0.6, 0.6] the color is light grey. Above 0.6,…

Whats the point of @staticmethod in Python?

Ive developed this short test/example code, in order to understand better how static methods work in Python.class TestClass:def __init__(self, size):self.size = sizedef instance(self):print("regul…

logical or on list of pandas masks

I have a list of boolean masks obtained by applying different search criteria to a dataframe. Here is an example list containing 4 masks: mask_list = [mask1, mask2, mask3, mask4]I would like to find th…