Using Pandas df.where on multiple columns produces unexpected NaN values

2024/9/20 19:24:08

Given the DataFrame

import pandas as pddf = pd.DataFrame({'transformed': ['left', 'right', 'left', 'right'],'left_f': [1, 2, 3, 4],'right_f': [10, 20, 30, 40],'left_t': [-1, -2, -3, -4],'right_t': [-10, -20, -30, -40],
})

I want to create two new columns, picking from either left_* or right_* depending on the content of transformed:

df['transformed_f'] = df['right_f'].where(df['transformed'] == 'right',df['left_f']
)df['transformed_t'] = df['right_t'].where(df['transformed'] == 'right',df['left_t']
)

And I get the expected result

df
#    transformed  left_f  right_f  left_t  right_t  transformed_f  transformed_t
# 0  left              1       10      -1      -10              1             -1
# 1  right             2       20      -2      -20             20            -20
# 2  left              3       30      -3      -30              3             -3
# 3  right             4       40      -4      -40             40            -40

However when I try to do it in one operation I get an unexpected result containing NaN values

df[['transformed_f', 'transformed_t']] = df[['right_f', 'right_t']].where(df['transformed'] == 'right',df[['left_f', 'left_t']]
)df
#    transformed  left_f  right_f  left_t  right_t  transformed_f  transformed_t
# 0  left              1       10      -1      -10            NaN            NaN
# 1  right             2       20      -2      -20           20.0          -20.0
# 2  left              3       30      -3      -30            NaN            NaN
# 3  right             4       40      -4      -40           40.0          -40.0

Is there a way to use df.where() on multiple columns at once?

Answer

You are close , just add.values or .to_numpy() with the slice to make it an NDarray:

Per docs:

other : scalar, NDFrame, or callableEntries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the NDFrame and should return scalar or NDFrame. The callable must not change input NDFrame (though pandas doesn’t check it).

So when you directly input the slice of the dataframe, the indexes(col names) dont match and hence it doesn't update the df, when you pass .values , it ignores the indexes and add the values.

df[['transformed_f', 'transformed_t']]=(df[['right_f', 'right_t']].where(df['transformed'] == 'right',df[['left_f', 'left_t']].values))
print(df)

  transformed  left_f  right_f  left_t  right_t  transformed_f  transformed_t
0        left       1       10      -1      -10              1             -1
1       right       2       20      -2      -20             20            -20
2        left       3       30      -3      -30              3             -3
3       right       4       40      -4      -40             40            -40
https://en.xdnf.cn/q/72315.html

Related Q&A

Django star rating system and AJAX

I am trying to implement a star rating system on a Django site.Storing the ratings in my models is sorted, as is displaying the score on the page. But I want the users to be able to rate a page (from 1…

Create inheritance graphs/trees for Django templates

Is there any tool out there that would take a directory with a Django application, scan it for templates and draw/print/list a hierarchy of inheritance between templates?Seeing which blocks are being …

Python SVG converter creates empty file

I have some code below that is supposed to convert a SVG image to a PNG. It runs without errors but creates a PNG file that is blank instead of one with the same image as the original SVG. I did find t…

Fastest way to iterate through a pandas dataframe?

How do I run through a dataframe and return only the rows which meet a certain condition? This condition has to be tested on previous rows and columns. For example:#1 #2 #3 #4 1/1/1999 4 …

Constraints do not follow DCP rules in CVXPY

I want to solve this problem using CVXPY but I dont know why I get the following error message:DCPError: Problem does not follow DCP rules. I guess my constraints are not DCP. Is there any way to model…

is this betweenness calculation correct?

I try to calculate betweenness for all nodes for the path from 2 to 6 in this simple graph.G=nx.Graph() edge=[(1,5),(2,5),(3,5),(4,5),(4,6),(5,7),(7,6)] G.add_edges_from(edge) btw=nx.betweenness_centra…

Why does PIL thumbnail not resizing correctly?

I am trying to create and save a thumbnail image when saving the original user image in the userProfile model in my project, below is my code:def save(self, *args, **kwargs):super(UserProfile, self).sa…

Put the legend of pandas bar plot with secondary y axis in front of bars

I have a pandas DataFrame with a secondary y axis and I need a bar plot with the legend in front of the bars. Currently, one set of bars is in front of the legend. If possible, I would also like to pla…

Printing floats with a specific number of zeros

I know how to control the number of decimals, but how do I control the number of zeros specifically?For example:104.06250000 -> 104.0625 119.00000 -> 119.0 72.000000 -> 72.0

How do I make a matplotlib scatter plot square?

In gnuplot I can do this to get a square plot:set size squareWhat is the equivalent in matplotlib? I have tried this:import matplotlib matplotlib.use(Agg) import matplotlib.pyplot as plt plt.rcParams[…