I have a Pandas DataFrame with conditional column A and numeric column B.
A B
1 'foo' 1.2
2 'bar' 1.3
3 'foo' 2.2
I also have a Python dictionary that defines ranges of B which denote "success" given each value of A.
mydict = {'foo': [1, 2], 'bar': [2, 3]}
I want to make a new column, 'error', in the dataframe. It should describe how far outside of the acceptable bounds for A the value of B falls. If A is within the range, the value should be zero.
A B error
1 'foo' 1.2 0
2 'bar' 1.3 -0.7
3 'foo' 2.2 0.2
I'm not a complete Pandas/Numpy newbie, and I'm halfway decent at Python, but this proved somewhat difficult. I don't want to do it with iterrows(), since I understand that's computationally expensive and this is going to get called a lot.
I eventually figured out a solution by combining lambda functions, pandas.DataFrame.map(), and nested numpy.where()s with given values for the optional x and y inputs.
getmin = lambda x: mydict[x][0]
getmax = lambda x: mydict[x][1]
df['error'] = np.where(df.B < dtfr.A.map(getmin),df.B - df.A.map(getmin),np.where(df.B > df.A.map(getmax),df.B - df.A.map(getmax),0))
It works, but this can't possibly be the best way to do this, right? I feel like I'm abusing numpy.where() to get around not knowing how to map values from multiple columns of a dataframe to a lambda function in a non-iterative way. (Also to avoid writing mildly gnarly lambda functions).
Kind of three questions, I guess.
- Is it OK to nest numpy.where()s for triconditional array operations?
- How can I non-iteratively map from two dataframe columns to one function?
- If 2) is possible and 1) is acceptable, which is preferable?