Question 1

I am trying to find unique groups in a column(here for letter column) from an excel file. The data looks like this:

id	letter
1	A, B, D, E, F
3	B, C
2	B
75	T
54	K, M
9	D, B
23	B, D, A
34	X, Y, Z
67	X, Y
12	E, D
15	G
10	G
11	F

Any element of a group should not be appeared in another groups' elements. According to previous table the output file should be like this:

id	letter
75	T
54	K, M

Because any elements of these groups havent been shared with another group.

The code I tried:

df: pd.DataFrame = pd.DataFrame([
["A, B, D, E, F"], ["B, C"], ["B"], ["T"],  ["K, M"], ["D, B"], ["B, D, A"], ["X, Y, Z"], ["X, Y"],
["E, D"], ["G"], ["G"]], columns=["letters"])
if __name__ == "__main__":
sub_ids=[]
for i in range(len(df)):temp_sub_ids = []curr_letters_i = df.iloc[i]["letters"].replace(" ", "").split(",")for j in range(len(df)):if i == j:continuecurr_letters_j = df.iloc[j]["letters"].replace(" ", "").split(",")if not any([letter in curr_letters_i for letter in curr_letters_j]):temp_sub_ids.append(f"{df.iloc[j]['id']}")sub_ids.append(",".join(temp_sub_ids))
df["sub-ids"] = sub_ids
print(df)

With this code, it gives each ids as sub ids that dont have any shared letter. But I want to search for all letter groups and if there is not any shared letter with other groups, then it will be as unique.

Question 2

Algorithm:

appending an auxiliary column letter_ based on splitting letter column on regex separator having each group as a list of values/elements
explode/expand letter_ so that each value is placed on a separate row
map each letter_ value to its frequency (count of its occurrences)
filter letter (initial groups) which only have items with a single occurrence (max count is 1)

df = df.assign(letter_=df['letter'].str.split(r'\s*,\s*')).explode('letter_')
df['letter_'] = df['letter_'].map(df['letter_'].value_counts())
df = df.groupby('letter').filter(lambda x: x['letter_'].max() == 1)\.drop_duplicates().drop('letter_', axis=1).reset_index(drop=True)
print(df)

   id letter
0  75      T
1  54   K, M

Get unique groups from a set of group

Related Q&A

Move and Rename files using Python

AttributeError: StringVar object has no attribute encode

Reading specific column from a csv file in python

Date Time Series wise grouping of data and distribution

Trying to run Python in html

Removing from a string all the characthers included between two specific characters in Python

Pyside6: Create QTabWidget with function rather than class

Pythons End Of Life

Gathering numerical data from a string input

Getting Turtle in Python to recognize click events [duplicate]