Conditional column arithmetic in pandas dataframe

2024/10/13 11:24:50

I have a pandas dataframe with the following structure:

import numpy as np
import pandas as pd
myData = pd.DataFrame({'x': [1.2,2.4,5.3,2.3,4.1], 'y': [6.7,7.5,8.1,5.3,8.3], 'condition':[1,1,np.nan,np.nan,1],'calculation': [np.nan]*5})print myDatacalculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2          NaN        NaN  5.3  8.1
3          NaN        NaN  2.3  5.3
4          NaN          1  4.1  8.3

I want to enter a value in the 'calculation' column based on the values in 'x' and 'y' (e.g. x/y) but only in those cells where the 'condition' column contains NaN (np.isnan(myData['condition']). The final dataframe should look like this:

   calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2        0.654        NaN  5.3  8.1
3        0.434        NaN  2.3  5.3
4          NaN          1  4.1  8.3

I'm happy with the idea of stepping through each row in turn using a 'for' loop and then using 'if' statements to make the calculations but the actual dataframe I have is very large and I wanted do the calculations in an array-based way. Is this possible? I guess I could calculate the value for all rows and then delete the ones I don't want but this seems like a lot of wasted effort (the NaNs are quite rare in the dataframe) and, in some cases where 'condition' equals 1, the calculation cannot be made due to division by zero.

Thanks in advance.

Answer

Use where and pass your condition to it, this will then only perform your calculation where the rows meet the condition:

In [117]:myData['calculation'] = (myData['x']/myData['y']).where(myData['condition'].isnull())
myData
Out[117]:calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2     0.654321        NaN  5.3  8.1
3     0.433962        NaN  2.3  5.3
4          NaN          1  4.1  8.3
https://en.xdnf.cn/q/69542.html

Related Q&A

Need some assistance with Python threading/queue

import threading import Queue import urllib2 import timeclass ThreadURL(threading.Thread):def __init__(self, queue):threading.Thread.__init__(self)self.queue = queuedef run(self):while True:host = self…

Python redirect (with delay)

So I have this python page running on flask. It works fine until I want to have a redirect. @app.route("/last_visit") def check_last_watered():templateData = template(text = water.get_last_wa…

Python Selenium. How to use driver.set_page_load_timeout() properly?

from selenium import webdriverdriver = webdriver.Chrome() driver.set_page_load_timeout(7)def urlOpen(url):try:driver.get(url)print driver.current_urlexcept:returnThen I have URL lists and call above me…

Editing both sides of M2M in Admin Page

First Ill lay out what Im trying to achieve in case theres a different way to go about it!I want to be able to edit both sides of an M2M relationship (preferably on the admin page although if needs be …

unstacking shift data (start and end time) into hourly data

I have a df as follows which shows when a person started a shift, ended a shift, the amount of hours and the date worked. Business_Date Number PayTimeStart PayTimeEnd Hours 0 2019-05-24 1…

Tensorflow model prediction is slow

I have a TensorFlow model with a single Dense layer: model = tf.keras.Sequential([tf.keras.layers.Dense(2)]) model.build(input_shape=(None, None, 25))I construct a single input vector in float32: np_ve…

Pandas Sqlite query using variable

With sqlite3 in Python if I want to make a db query using a variable instead of a fixed command I can do something like this :name = MSFTc.execute(INSERT INTO Symbol VALUES (?) , (name,))And when I tr…

How to remove ^M from a text file and replace it with the next line

So suppose I have a text file of the following contents:Hello what is up. ^M ^M What are you doing?I want to remove the ^M and replace it with the line that follows. So my output would look like:Hello…

Cython: size attribute of memoryviews

Im using a lot of 3D memoryviews in Cython, e.g.cython.declare(a=double[:, :, ::1]) a = np.empty((10, 20, 30), dtype=double)I often want to loop over all elements of a. I can do this using a triple loo…

python asynchronous httprequest

I am trying to use twitter search web service in python. I want to call a web service like:http://search.twitter.com/search.json?q=blue%20angels&rpp=5&include_entities=true&result_type=mix…