Flag the first non zero column value with 1 and rest 0 having multiple columns

2024/7/8 7:25:38

Please assist with the below

import pandas as pd
df = pd.DataFrame({'Grp': [1,1,1,1,2,2,2,2,3,3,3,4,4,4], 'Org1': ['x','x','y','y','z','y','z','z','x','y','y','z','x','x'], 'Org2': ['a','a','b','b','c','b','c','c','a','b','b','c','a','a'], 'Value': [0,0,3,1,0,1,0,5,0,0,0,1,1,1]})
df

*** I need the first non zero value having "FLAG" = 1 and other 0

Details :

For each unique set of "Grp, Org1, Org2" and based on the "Value" "FLAG" to have 1 and the others as 0.

If values are all 0 in a Column then FLAG = 0 for all

If values are all NON ZERO in a Column then first instance to have FLAG = 1 and others 0

I am expecting the output as below

+----+-----+------+------+-------+------+
|    | Grp | Org1 | Org2 | Value | FLAG |
+----+-----+------+------+-------+------+
|  0 |   1 | x    | a    |     0 |    0 |
|  1 |   1 | x    | a    |     0 |    0 |
|  2 |   1 | y    | b    |     3 |    1 |
|  3 |   1 | y    | b    |     1 |    0 |
|  4 |   2 | z    | c    |     0 |    0 |
|  5 |   2 | y    | b    |     1 |    1 |
|  6 |   2 | z    | c    |     0 |    0 |
|  7 |   2 | z    | c    |     5 |    1 |
|  8 |   3 | x    | a    |     0 |    0 |
|  9 |   3 | y    | b    |     0 |    0 |
| 10 |   3 | y    | b    |     0 |    0 |
| 11 |   4 | z    | c    |     1 |    1 |
| 12 |   4 | x    | a    |     1 |    1 |
| 13 |   4 | x    | a    |     1 |    0 |
+----+-----+------+------+-------+------+
Answer

Start with a simple flag to determine if the value is set.

df = df.assign(FLAG=df.Value.where(df.Value == 0, 1))
df
#     Grp Org1 Org2  Value  FLAG
# 0     1    x    a      0     0
# 1     1    x    a      0     0
# 2     1    y    b      3     1
# 3     1    y    b      1     1
# 4     2    z    c      0     0
# 5     2    y    b      1     1
# 6     2    z    c      0     0
# 7     2    z    c      5     1
# 8     3    x    a      0     0
# 9     3    y    b      0     0
# 10    3    y    b      0     0
# 11    4    z    c      1     1
# 12    4    x    a      1     1
# 13    4    x    a      1     1

Then, using groupby to work independently per group, you can find the first flag that was set by using pd.Series.cummax followed by pd.Series.diff.

flag = df.groupby(['Grp', 'Org1', 'Org2'])['FLAG'].transform(lambda x: x.cummax().diff())                                                                                                                                                    
df['FLAG'] = flag.where(flag.notnull(), df['FLAG']).astype(int)
df
#     Grp Org1 Org2  Value  FLAG
# 0     1    x    a      0     0
# 1     1    x    a      0     0
# 2     1    y    b      3     1
# 3     1    y    b      1     0
# 4     2    z    c      0     0
# 5     2    y    b      1     1
# 6     2    z    c      0     0
# 7     2    z    c      5     1
# 8     3    x    a      0     0
# 9     3    y    b      0     0
# 10    3    y    b      0     0
# 11    4    z    c      1     1
# 12    4    x    a      1     1
# 13    4    x    a      1     0

Using cummax will convert everything after the first 1 entry into a 1 as well, so that diff will be all 0 except for the first step from 0 to 1.

https://en.xdnf.cn/q/119709.html

Related Q&A

How to split up data from a column in a csv file into two separate output csv files?

I have a .csv file, e.g.:ID NAME CATEGORIES 1, x, AB 2, xx, AA 3, xxx, BAHow would I get this to form two output .csv files based on the category e.g.:File 1:ID NAME CATEGORY 1, x, A 2, xx, A 3, …

Discord.py spellcheck commands

Recently, I looked up Stack Overflow and found this code which can check for potential typos: from difflib import SequenceMatcher SequenceMatcher(None, "help", "hepl").ratio() # Ret…

Django Model Form doesnt seem to validate the BooleanField

In my model the validation is not validating for the boolean field, only one time product_field need to be checked , if two time checked raise validation error.product_field = models.BooleanField(defau…

For loop only shows the first object

I have a code that loops through a list of mails, but it is only showing the first result, even though there are also other matches. The other results require me to loop over the mails again only to re…

IndexError: pop from empty list

I need help. I have no idea why I am getting this error. The error is in fname = 1st.pop()for i in range(num) :fname = lst.pop()lTransfer = [(os.path.join(src, fname), os.path.join(dst, fna…

Cannot import name StandardScalar from sklearn.preprocessing [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

unable to solve strptime() issue even after trying all the formats

Im using the following code:data[Input_volTargetStart][1]>time.strptime(data[Dates][1], "%d %b $y")When I try to run it, I get this error:ValueError: time data 04-Jun-99 does not match for…

OSError. locateOnScreen not working in pyautogui

import pyautoguipyautogui.locateOnScreen(photo.png)Error: OSError: Failed to read photo.png because file is missing, has improper permissions, or is an unsupported or invalid format

Insert into table using For In Range and keys of the value

I have a query (sql1) that populates data, and I am trying to insert the outcome of this data (sql1) as well as other inputs into same table.Here is first query (sql1).sql1 = Select Creator_Id, Record…

Django - how to follow some object (not user) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 8…