I am trying to find unique groups in a column(here for letter column) from an excel file. The data looks like this:
id | letter |
---|---|
1 | A, B, D, E, F |
3 | B, C |
2 | B |
75 | T |
54 | K, M |
9 | D, B |
23 | B, D, A |
34 | X, Y, Z |
67 | X, Y |
12 | E, D |
15 | G |
10 | G |
11 | F |
Any element of a group should not be appeared in another groups' elements. According to previous table the output file should be like this:
id | letter |
---|---|
75 | T |
54 | K, M |
Because any elements of these groups havent been shared with another group.
The code I tried:
df: pd.DataFrame = pd.DataFrame([
["A, B, D, E, F"], ["B, C"], ["B"], ["T"], ["K, M"], ["D, B"], ["B, D, A"], ["X, Y, Z"], ["X, Y"],
["E, D"], ["G"], ["G"]], columns=["letters"])
if __name__ == "__main__":
sub_ids=[]
for i in range(len(df)):temp_sub_ids = []curr_letters_i = df.iloc[i]["letters"].replace(" ", "").split(",")for j in range(len(df)):if i == j:continuecurr_letters_j = df.iloc[j]["letters"].replace(" ", "").split(",")if not any([letter in curr_letters_i for letter in curr_letters_j]):temp_sub_ids.append(f"{df.iloc[j]['id']}")sub_ids.append(",".join(temp_sub_ids))
df["sub-ids"] = sub_ids
print(df)
With this code, it gives each ids as sub ids that dont have any shared letter. But I want to search for all letter groups and if there is not any shared letter with other groups, then it will be as unique.