Groupby count only when a certain value is present in one of the column in pandas

2024/9/27 23:27:06

I have a dataframe similar to the below mentioned database:

+------------+-----+--------+| time | id | status |+------------+-----+--------+| 1451606400 | id1 | Yes || 1451606400 | id1 | Yes || 1456790400 | id2 | No || 1456790400 | id2 | Yes || 1456790400 | id2 | No |+------------+-----+--------+

I'm grouping by all the columns mentioned above and i'm able to get the count in a different column named 'count' successfully using the below command:

df.groupby(['time','id', 'status']).size().reset_index(name='count')

But I want the count in the above dataframe only in those rows with status = 'Yes' and rest should be '0'

Desired Output:

+------------+-----+--------+---------+| time | id | status | count |+------------+-----+--------+---------+| 1451606400 | id1 | Yes | 2 || 1456790400 | id2 | Yes | 1 || 1456790400 | id2 | No | 0 |+------------+-----+--------+---------+

I tried to count for status = 'Yes' with the below code:

df[df['status']== 'Yes'].groupby(['time','id','status']).size().reset_index(name='count')

which obviously gives me those rows with status = 'Yes' and discarded the rest. I want the discarded ones with count = 0

Is there any way to get the result?

Thanks in advance!

Answer

Use lambda function with apply and for count sum boolena True values proccesses like 1:

df1 = (df.groupby(['time','id','status']).apply(lambda x: (x['status']== 'Yes').sum()).reset_index(name='count'))

Or create new column and aggregate sum:

df1 = (df.assign(A=df['status']=='Yes').groupby(['time','id','status'])['A'].sum().astype(int).reset_index(name='count'))

Very similar solution with no new column, but worse readable a bit:

df1 = ((df['status']=='Yes').groupby([df['time'],df['id'],df['status']]).sum().astype(int).reset_index(name='count'))print (df)time   id status  count
0  1451606400  id1    Yes      2
1  1456790400  id2     No      0
2  1456790400  id2    Yes      1
https://en.xdnf.cn/q/71404.html

Related Q&A

how to save tensorflow model to pickle file

I want to save a Tensorflow model and then later use it for deployment purposes. I dont want to use model.save() to save it because my purpose is to somehow pickle it and use it in a different system w…

PySide2 Qt3D mesh does not show up

Im diving into Qt3D framework and have decided to replicate a simplified version of this c++ exampleUnfortunately, I dont see a torus mesh on application start. Ive created all required entities and e…

Unable to import module lambda_function: No module named psycopg2._psycopg aws lambda function

I have installed the psycopg2 with this command in my package folder : pip install --target ./package psycopg2 # Or pip install -t ./package psycopg2now psycopg2 module is in my package and I have crea…

RestrictedPython: Call other functions within user-specified code?

Using Yuri Nudelmans code with the custom _import definition to specify modules to restrict serves as a good base but when calling functions within said user_code naturally due to having to whitelist e…

TypeError: object of type numpy.int64 has no len()

I am making a DataLoader from DataSet in PyTorch. Start from loading the DataFrame with all dtype as an np.float64result = pd.read_csv(dummy.csv, header=0, dtype=DTYPE_CLEANED_DF)Here is my dataset cla…

VS Code Pylance not highlighting variables and modules

Im using VS Code with the Python and Pylance extensions. Im having a problem with the Pylance extension not doing syntax highlight for things like modules and my dataframe. I would expect the modules…

How to compute Spearman correlation in Tensorflow

ProblemI need to compute the Pearson and Spearman correlations, and use it as metrics in tensorflow.For Pearson, its trivial :tf.contrib.metrics.streaming_pearson_correlation(y_pred, y_true)But for Spe…

Pytorch loss is nan

Im trying to write my first neural network with pytorch. Unfortunately, I encounter a problem when I want to get the loss. The following error message: RuntimeError: Function LogSoftmaxBackward0 return…

How do you debug python code with kubernetes and skaffold?

I am currently running a django app under python3 through kubernetes by going through skaffold dev. I have hot reload working with the Python source code. Is it currently possible to do interactive deb…

Discrepancies between R optim vs Scipy optimize: Nelder-Mead

I wrote a script that I believe should produce the same results in Python and R, but they are producing very different answers. Each attempts to fit a model to simulated data by minimizing deviance usi…