Get unique groups from a set of group

2024/10/10 12:19:36

I am trying to find unique groups in a column(here for letter column) from an excel file. The data looks like this:

id letter
1 A, B, D, E, F
3 B, C
2 B
75 T
54 K, M
9 D, B
23 B, D, A
34 X, Y, Z
67 X, Y
12 E, D
15 G
10 G
11 F

Any element of a group should not be appeared in another groups' elements. According to previous table the output file should be like this:

id letter
75 T
54 K, M

Because any elements of these groups havent been shared with another group.

The code I tried:

df: pd.DataFrame = pd.DataFrame([
["A, B, D, E, F"], ["B, C"], ["B"], ["T"],  ["K, M"], ["D, B"], ["B, D, A"], ["X, Y, Z"], ["X, Y"],
["E, D"], ["G"], ["G"]], columns=["letters"])
if __name__ == "__main__":
sub_ids=[]
for i in range(len(df)):temp_sub_ids = []curr_letters_i = df.iloc[i]["letters"].replace(" ", "").split(",")for j in range(len(df)):if i == j:continuecurr_letters_j = df.iloc[j]["letters"].replace(" ", "").split(",")if not any([letter in curr_letters_i for letter in curr_letters_j]):temp_sub_ids.append(f"{df.iloc[j]['id']}")sub_ids.append(",".join(temp_sub_ids))
df["sub-ids"] = sub_ids
print(df)

With this code, it gives each ids as sub ids that dont have any shared letter. But I want to search for all letter groups and if there is not any shared letter with other groups, then it will be as unique.

Answer

Algorithm:

  • appending an auxiliary column letter_ based on splitting letter column on regex separator having each group as a list of values/elements
  • explode/expand letter_ so that each value is placed on a separate row
  • map each letter_ value to its frequency (count of its occurrences)
  • filter letter (initial groups) which only have items with a single occurrence (max count is 1)

df = df.assign(letter_=df['letter'].str.split(r'\s*,\s*')).explode('letter_')
df['letter_'] = df['letter_'].map(df['letter_'].value_counts())
df = df.groupby('letter').filter(lambda x: x['letter_'].max() == 1)\.drop_duplicates().drop('letter_', axis=1).reset_index(drop=True)
print(df)

   id letter
0  75      T
1  54   K, M
https://en.xdnf.cn/q/118452.html

Related Q&A

Move and Rename files using Python

I have .xls files in a directory that i need to move to another directory and renamed. Those files are updated monthly and will have a different name each month. For instance the current name is Geoc…

AttributeError: StringVar object has no attribute encode

Im making a program to generate an encrypted qr from the message and password provided, but it keeps on returning the same error.I tried passing the value to other variablesmain.pyfrom tkinter import *…

Reading specific column from a csv file in python

I am writing some python code to practice for an exam in January. I need to read the second column into my code and print it out. If possible i also need to add data to specific columns. The code i hav…

Date Time Series wise grouping of data and distribution

I am trying the merge the datetime series with a repository data while grouping by name and summing the values. File1.csv Timeseries,Name,count 07/03/2015 06:00:00,Paris,100 07/03/2015 06:00:00,Paris,6…

Trying to run Python in html

I am trying to run a python program in html but I am getting an error. First it says Then if I type anything it appears with this error This was the Html code <html><head><title>Antho…

Removing from a string all the characthers included between two specific characters in Python

Whats a fast way in Python to take all the characters included between two specific characters out of a string?

Pyside6: Create QTabWidget with function rather than class

Ive been trying to make an application using Pyside6 and cant seem to understand why we cant create a QDialog having QTabWidget with just functions. I am not sure if I am making a mistake somewhere so,…

Pythons End Of Life

What exactly will happen to Python 2.7 after 1/2020?I understand that Python 2.7 will no longer be supported but what will actually happen? Does it mean that decision makers will delete the whole cod…

Gathering numerical data from a string input

I would like to get user input for their credit rating e.g AAA, A, BBB etc and then assign an interest rate to this. For example, if the user has a good credit rating e.g AAA I would charge an interest…

Getting Turtle in Python to recognize click events [duplicate]

This question already has an answer here:Turtle in python- Trying to get the turtle to move to the mouse click position and print its coordinates(1 answer)Closed 5 months ago.Im trying to make Connect …