I have a dataframe similar to the below mentioned database:
+------------+-----+--------+| time | id | status |+------------+-----+--------+| 1451606400 | id1 | Yes || 1451606400 | id1 | Yes || 1456790400 | id2 | No || 1456790400 | id2 | Yes || 1456790400 | id2 | No |+------------+-----+--------+
I'm grouping by all the columns mentioned above and i'm able to get the count in a different column named 'count'
successfully using the below command:
df.groupby(['time','id', 'status']).size().reset_index(name='count')
But I want the count in the above dataframe only in those rows with status = 'Yes'
and rest should be '0'
Desired Output:
+------------+-----+--------+---------+| time | id | status | count |+------------+-----+--------+---------+| 1451606400 | id1 | Yes | 2 || 1456790400 | id2 | Yes | 1 || 1456790400 | id2 | No | 0 |+------------+-----+--------+---------+
I tried to count for status = 'Yes'
with the below code:
df[df['status']== 'Yes'].groupby(['time','id','status']).size().reset_index(name='count')
which obviously gives me those rows with status = 'Yes'
and discarded the rest. I want the discarded ones with count = 0
Is there any way to get the result?
Thanks in advance!