Question 1

I have a Pandas DataFrame with conditional column A and numeric column B.

    A    B
1 'foo' 1.2
2 'bar' 1.3
3 'foo' 2.2

I also have a Python dictionary that defines ranges of B which denote "success" given each value of A.

mydict = {'foo': [1, 2], 'bar': [2, 3]}

I want to make a new column, 'error', in the dataframe. It should describe how far outside of the acceptable bounds for A the value of B falls. If A is within the range, the value should be zero.

    A    B   error
1 'foo' 1.2   0
2 'bar' 1.3  -0.7
3 'foo' 2.2   0.2

I'm not a complete Pandas/Numpy newbie, and I'm halfway decent at Python, but this proved somewhat difficult. I don't want to do it with iterrows(), since I understand that's computationally expensive and this is going to get called a lot.

I eventually figured out a solution by combining lambda functions, pandas.DataFrame.map(), and nested numpy.where()s with given values for the optional x and y inputs.

getmin = lambda x: mydict[x][0]
getmax = lambda x: mydict[x][1] 
df['error'] = np.where(df.B < dtfr.A.map(getmin),df.B - df.A.map(getmin),np.where(df.B > df.A.map(getmax),df.B - df.A.map(getmax),0))

It works, but this can't possibly be the best way to do this, right? I feel like I'm abusing numpy.where() to get around not knowing how to map values from multiple columns of a dataframe to a lambda function in a non-iterative way. (Also to avoid writing mildly gnarly lambda functions).

Kind of three questions, I guess.

Is it OK to nest numpy.where()s for triconditional array operations?
How can I non-iteratively map from two dataframe columns to one function?
If 2) is possible and 1) is acceptable, which is preferable?

Question 2

For your question about how to map multiple columns, you do it with

DataFrame.apply( , axis =1)

For your question I don't think you need this, but I think it's clearer if you do your calculation in a few steps:

df['low'] = df.A.map(lambda x: mydict[x][0])
df['high'] = df.A.map(lambda x: mydict[x][1])
df['error'] = np.maximum(df.B - df.high, 0) + np.minimum(df.B - df.low, 0)
dfA    B  low  high  error
1  foo  1.2    1     2    0.0
2  bar  1.3    2     3   -0.7
3  foo  2.2    1     2    0.2

Alternatives to nested numpy.where for multiconditional pandas operations?

Related Q&A

OpenCV findContours() just returning one external contour

Selenium/ChromeDriver Unknown policy Errors

Mako escaping issue within Pyramid

All arguments should have the same length plotly

Python curves intersection with fsolve() and function arguments using numpy

What is the Python freeze process?

Is there a way to get the top k values per row of a numpy array (Python)?

Python with tcpdump in a subprocess: how to close subprocess properly?

How to install GDB with Python support on Windows 7

Pip3 is unable to install requirements.txt during docker build