Question 1

I have a dataframe similar to the below mentioned database:

+------------+-----+--------+| time | id | status |+------------+-----+--------+| 1451606400 | id1 | Yes || 1451606400 | id1 | Yes || 1456790400 | id2 | No || 1456790400 | id2 | Yes || 1456790400 | id2 | No |+------------+-----+--------+

I'm grouping by all the columns mentioned above and i'm able to get the count in a different column named 'count' successfully using the below command:

df.groupby(['time','id', 'status']).size().reset_index(name='count')

But I want the count in the above dataframe only in those rows with status = 'Yes' and rest should be '0'

Desired Output:

+------------+-----+--------+---------+| time | id | status | count |+------------+-----+--------+---------+| 1451606400 | id1 | Yes | 2 || 1456790400 | id2 | Yes | 1 || 1456790400 | id2 | No | 0 |+------------+-----+--------+---------+

I tried to count for status = 'Yes' with the below code:

df[df['status']== 'Yes'].groupby(['time','id','status']).size().reset_index(name='count')

which obviously gives me those rows with status = 'Yes' and discarded the rest. I want the discarded ones with count = 0

Is there any way to get the result?

Thanks in advance!

Question 2

Use lambda function with apply and for count sum boolena True values proccesses like 1:

df1 = (df.groupby(['time','id','status']).apply(lambda x: (x['status']== 'Yes').sum()).reset_index(name='count'))

Or create new column and aggregate sum:

df1 = (df.assign(A=df['status']=='Yes').groupby(['time','id','status'])['A'].sum().astype(int).reset_index(name='count'))

Very similar solution with no new column, but worse readable a bit:

df1 = ((df['status']=='Yes').groupby([df['time'],df['id'],df['status']]).sum().astype(int).reset_index(name='count'))print (df)time   id status  count
0  1451606400  id1    Yes      2
1  1456790400  id2     No      0
2  1456790400  id2    Yes      1

Groupby count only when a certain value is present in one of the column in pandas

Related Q&A

how to save tensorflow model to pickle file

PySide2 Qt3D mesh does not show up

Unable to import module lambda_function: No module named psycopg2._psycopg aws lambda function

RestrictedPython: Call other functions within user-specified code?

TypeError: object of type numpy.int64 has no len()

VS Code Pylance not highlighting variables and modules

How to compute Spearman correlation in Tensorflow

Pytorch loss is nan

How do you debug python code with kubernetes and skaffold?

Discrepancies between R optim vs Scipy optimize: Nelder-Mead