Problem Statement:
A pandas dataframe column series, same_group
needs to be created from booleans according to the values of two existing columns, row
and col
. The row needs to show True if both cells across a row have similar values (intersecting values) in a dictionary memberships
, and False otherwise (no intersecting values). How do I do this in a vectorized way (not using apply)?
Setup:
import pandas as pd
import numpy as np
n = np.nan
memberships = {'a':['vowel'],'b':['consonant'],'c':['consonant'],'d':['consonant'],'e':['vowel'],'y':['consonant', 'vowel']
}congruent = pd.DataFrame.from_dict( {'row': ['a','b','c','d','e','y'],'a': [ n, -.8,-.6,-.3, .8, .01],'b': [-.8, n, .5, .7,-.9, .01],'c': [-.6, .5, n, .3, .1, .01],'d': [-.3, .7, .3, n, .2, .01],'e': [ .8,-.9, .1, .2, n, .01],'y': [ .01, .01, .01, .01, .01, n],}).set_index('row')
congruent.columns.names = ['col']
cs = congruent.stack().to_frame()
cs.columns = ['score']
cs.reset_index(inplace=True)
cs.head(6)
The Desired Goal:
How do I accomplish creating this new column based on a lookup on a dictionary?
Note that I'm trying to find intersection, not equivalence. For example, row 4 should have a same_group
of 1, since a
and y
are both vowels (despite that y
is "sometimes a vowel" and thus belongs to groups consonant and vowel).