Question 1

I have a pandas dataframe with the following structure:

import numpy as np
import pandas as pd
myData = pd.DataFrame({'x': [1.2,2.4,5.3,2.3,4.1], 'y': [6.7,7.5,8.1,5.3,8.3], 'condition':[1,1,np.nan,np.nan,1],'calculation': [np.nan]*5})print myDatacalculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2          NaN        NaN  5.3  8.1
3          NaN        NaN  2.3  5.3
4          NaN          1  4.1  8.3

I want to enter a value in the 'calculation' column based on the values in 'x' and 'y' (e.g. x/y) but only in those cells where the 'condition' column contains NaN (np.isnan(myData['condition']). The final dataframe should look like this:

   calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2        0.654        NaN  5.3  8.1
3        0.434        NaN  2.3  5.3
4          NaN          1  4.1  8.3

I'm happy with the idea of stepping through each row in turn using a 'for' loop and then using 'if' statements to make the calculations but the actual dataframe I have is very large and I wanted do the calculations in an array-based way. Is this possible? I guess I could calculate the value for all rows and then delete the ones I don't want but this seems like a lot of wasted effort (the NaNs are quite rare in the dataframe) and, in some cases where 'condition' equals 1, the calculation cannot be made due to division by zero.

Thanks in advance.

Question 2

Use where and pass your condition to it, this will then only perform your calculation where the rows meet the condition:

In [117]:myData['calculation'] = (myData['x']/myData['y']).where(myData['condition'].isnull())
myData
Out[117]:calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2     0.654321        NaN  5.3  8.1
3     0.433962        NaN  2.3  5.3
4          NaN          1  4.1  8.3

Conditional column arithmetic in pandas dataframe

Related Q&A

Need some assistance with Python threading/queue

Python redirect (with delay)

Python Selenium. How to use driver.set_page_load_timeout() properly?

Editing both sides of M2M in Admin Page

unstacking shift data (start and end time) into hourly data

Tensorflow model prediction is slow

Pandas Sqlite query using variable

How to remove ^M from a text file and replace it with the next line

Cython: size attribute of memoryviews

python asynchronous httprequest