Vectorized Lookups of Pandas Series to a Dictionary

2024/10/3 4:35:55

Problem Statement:

A pandas dataframe column series, same_group needs to be created from booleans according to the values of two existing columns, row and col. The row needs to show True if both cells across a row have similar values (intersecting values) in a dictionary memberships, and False otherwise (no intersecting values). How do I do this in a vectorized way (not using apply)?

Setup:

import pandas as pd
import numpy as np 
n = np.nan
memberships = {'a':['vowel'],'b':['consonant'],'c':['consonant'],'d':['consonant'],'e':['vowel'],'y':['consonant', 'vowel']
}congruent = pd.DataFrame.from_dict(  {'row': ['a','b','c','d','e','y'],'a': [ n, -.8,-.6,-.3, .8, .01],'b': [-.8,  n, .5, .7,-.9, .01],'c': [-.6, .5,  n, .3, .1, .01],'d': [-.3, .7, .3,  n, .2, .01],'e': [ .8,-.9, .1, .2,  n, .01],'y': [ .01, .01, .01, .01,  .01, n],}).set_index('row')
congruent.columns.names = ['col']

snippet of dataframe cs

cs = congruent.stack().to_frame()
cs.columns = ['score']
cs.reset_index(inplace=True)
cs.head(6)

snippet of dataframe cs stacked

The Desired Goal:

finest drawing of added pandas column

How do I accomplish creating this new column based on a lookup on a dictionary?

Note that I'm trying to find intersection, not equivalence. For example, row 4 should have a same_group of 1, since a and y are both vowels (despite that y is "sometimes a vowel" and thus belongs to groups consonant and vowel).

Answer
# create a series to make it convenient to map
# make each member a set so I can intersect later
lkp = pd.Series(memberships).apply(set)# get number of rows and columns
# map the sets to column and row indices
n, m = congruent.shape
c = congruent.columns.to_series().map(lkp).values
r = congruent.index.to_series().map(lkp).values

print(c)
[{'vowel'} {'consonant'} {'consonant'} {'consonant'} {'vowel'}{'consonant', 'vowel'}]

print(r)
[{'vowel'} {'consonant'} {'consonant'} {'consonant'} {'vowel'}{'consonant', 'vowel'}]

# use np.repeat, np.tile, zip to create cartesian product
# this should match index after stacking
# apply set intersection for each pair
# empty sets are False, otherwise True
same = [bool(set.intersection(*tup))for tup in zip(np.repeat(r, m), np.tile(c, n))
]# use dropna=False to ensure we maintain the
# cartesian product I was expecting
# then slice with boolean list I created
# and dropna
congruent.stack(dropna=False)[same].dropna()row  col
a    e      0.80y      0.01
b    c      0.50d      0.70y      0.01
c    b      0.50d      0.30y      0.01
d    b      0.70c      0.30y      0.01
e    a      0.80y      0.01
y    a      0.01b      0.01c      0.01d      0.01e      0.01
dtype: float64

Produce wanted result

congruent.stack(dropna=False).reset_index(name='Score') \.assign(same_group=np.array(same).astype(int)).dropna()

enter image description here

https://en.xdnf.cn/q/70770.html

Related Q&A

Why cant I get my static dir to work with django 1.3?

This problem is very simple, but I just cant figure it outadded to my urlpatternsurl(r^static/(?P<path>.*)$, django.views.static.serve, {document_root: /home/user/www/site/static})where my main.…

Desktop Launcher for Python Script Starts Program in Wrong Path

I can not launch a python script from a .desktop launcher created on Linux Mint 17.1 Cinnamon.The problem is that the script will be launched in the wrong path - namely the home folder instead of the d…

How does numpy broadcasting perform faster?

In the following question, https://stackoverflow.com/a/40056135/5714445Numpys broadcasting provides a solution thats almost 6x faster than using np.setdiff1d() paired with np.view(). How does it manage…

python check if utf-8 string is uppercase

I am having trouble with .isupper() when I have a utf-8 encoded string. I have a lot of text files I am converting to xml. While the text is very variable the format is static. words in all caps should…

Failed building wheel for Twisted in Windows 10 python 3

Im trying to install rasa-core on my windows 10 machine.When installing with pip install, I get: Failed building wheel for TwistedThe same error appears when trying to install Twisted separately.How co…

How to set a fill color between two vertical lines

Using matplotlib, we can "trivially" fill the area between two vertical lines using fill_between() as in the example: https://matplotlib.org/3.2.1/gallery/lines_bars_and_markers/fill_between_…

Pythonic way to read file line by line?

Whats the Pythonic way to go about reading files line by line of the two methods below?with open(file, r) as f:for line in f:print lineorwith open(file, r) as f:for line in f.readlines():print lineOr …

Python Selenium clicking next button until the end

This is a follow up question from my first question, and I am trying to scrape a website and have Selenium click on next (until it cant be clicked) and collect the results as well.This is the html tag …

How can i detect one word with speech recognition in Python

I know how to detect speech with Python but this question is more specific: How can I make Python listening for only one word and then returns True if Python could recognize the word.I know, I could ju…

Finding matching and nonmatching items in lists

Im pretty new to Python and am getting a little confused as to what you can and cant do with lists. I have two lists that I want to compare and return matching and nonmatching elements in a binary form…