Alternatives to nested numpy.where for multiconditional pandas operations?

2024/7/27 15:02:51

I have a Pandas DataFrame with conditional column A and numeric column B.

    A    B
1 'foo' 1.2
2 'bar' 1.3
3 'foo' 2.2

I also have a Python dictionary that defines ranges of B which denote "success" given each value of A.

mydict = {'foo': [1, 2], 'bar': [2, 3]}

I want to make a new column, 'error', in the dataframe. It should describe how far outside of the acceptable bounds for A the value of B falls. If A is within the range, the value should be zero.

    A    B   error
1 'foo' 1.2   0
2 'bar' 1.3  -0.7
3 'foo' 2.2   0.2

I'm not a complete Pandas/Numpy newbie, and I'm halfway decent at Python, but this proved somewhat difficult. I don't want to do it with iterrows(), since I understand that's computationally expensive and this is going to get called a lot.

I eventually figured out a solution by combining lambda functions, pandas.DataFrame.map(), and nested numpy.where()s with given values for the optional x and y inputs.

getmin = lambda x: mydict[x][0]
getmax = lambda x: mydict[x][1] 
df['error'] = np.where(df.B < dtfr.A.map(getmin),df.B - df.A.map(getmin),np.where(df.B > df.A.map(getmax),df.B - df.A.map(getmax),0))

It works, but this can't possibly be the best way to do this, right? I feel like I'm abusing numpy.where() to get around not knowing how to map values from multiple columns of a dataframe to a lambda function in a non-iterative way. (Also to avoid writing mildly gnarly lambda functions).

Kind of three questions, I guess.

  1. Is it OK to nest numpy.where()s for triconditional array operations?
  2. How can I non-iteratively map from two dataframe columns to one function?
  3. If 2) is possible and 1) is acceptable, which is preferable?
Answer

For your question about how to map multiple columns, you do it with

DataFrame.apply( , axis =1)

For your question I don't think you need this, but I think it's clearer if you do your calculation in a few steps:

df['low'] = df.A.map(lambda x: mydict[x][0])
df['high'] = df.A.map(lambda x: mydict[x][1])
df['error'] = np.maximum(df.B - df.high, 0) + np.minimum(df.B - df.low, 0)
dfA    B  low  high  error
1  foo  1.2    1     2    0.0
2  bar  1.3    2     3   -0.7
3  foo  2.2    1     2    0.2
https://en.xdnf.cn/q/72807.html

Related Q&A

OpenCV findContours() just returning one external contour

Im trying to isolate letters in a captcha, I managed to filter a captcha and that result in this black and white image:But when I tried to separate the letters with findContours method of OpenCV it jus…

Selenium/ChromeDriver Unknown policy Errors

I am currently using Python (v3.5.1), Selenium (v3.7), and Chromedriver (v2.33).When I run the following command:from selenium import webdriver driver = webdriver.Chrome(C:\Program Files\ChromeWebdrive…

Mako escaping issue within Pyramid

I need to put javascript function to mako template. The first argument of this function is string, so I write in my *.mako file (dict(field_name=geom)):init_map(${field_name} );But when I see my html p…

All arguments should have the same length plotly

I try to do a bar graph using plotly.express but I find this problemAll arguments should have the same length. The length of argument y is 51, whereas the length of previously-processed arguments [x] …

Python curves intersection with fsolve() and function arguments using numpy

I am trying to use fsolve as quoted here : http://glowingpython.blogspot.gr/2011/05/hot-to-find-intersection-of-two.html,On order to find the intersection between two curves. Both curves basically are …

What is the Python freeze process?

The Python Doc states:Frozen modules are modules written in Python whose compiled byte-codeobject is incorporated into a custom-built Python interpreter byPython’s freeze utility. See Tools/freeze/ fo…

Is there a way to get the top k values per row of a numpy array (Python)?

Given a numpy array of the form below:x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]is there a way to retain the top-3 values in each row and set others to zero in python (without an…

Python with tcpdump in a subprocess: how to close subprocess properly?

I have a Python script to capture network traffic with tcpdump in a subprocess:p = subprocess.Popen([tcpdump, -I, -i, en1,-w, cap.pcap], stdout=subprocess.PIPE) time.sleep(10) p.kill()When this script …

How to install GDB with Python support on Windows 7

I need to debug cython code. Official documentation says, I need to install "gdb 7.2 or higher, built with Python support". Unfortunately I didnt find any step-by-step guide how to install it…

Pip3 is unable to install requirements.txt during docker build

I am using docker tutorial (https://docs.docker.com/language/python/build-images/) to build a simple python app. Using freeze command I made requirements.txt file which consists a lot of packages. When…