How to make a slice of DataFrame and fillna in specific slice using Python Pandas?

2024/9/21 3:17:02

The problem: let us take Titanic dataset from Kaggle. I have dataframe with columns "Pclass", "Sex" and "Age". I need to fill NaN in column "Age" with a median for certain group. If it is a woman from 1st class, I would like to fill her age with the median for 1st class women, not with the median for whole Age column.

The question is how to make this change in a certain slice?

I tried:

data['Age'][(data['Sex'] == 'female')&(data['Pclass'] == 1)&(data['Age'].isnull())].fillna(median)

where the "median" is my value, but nothing changes "inplace=True" didn't help.

Thanks alot!

Answer

I believe you need filter by masks and assign back:

data = pd.DataFrame({'a':list('aaaddd'),'Sex':['female','female','male','female','female','male'],'Pclass':[1,2,1,2,1,1],'Age':[40,20,30,20,np.nan,np.nan]})print (data)Age  Pclass     Sex  a
0  40.0       1  female  a
1  20.0       2  female  a
2  30.0       1    male  a
3  20.0       2  female  d
4   NaN       1  female  d
5   NaN       1    male  d#boolean mask
mask1 = (data['Sex'] == 'female')&(data['Pclass'] == 1)#get median by mask without NaNs
med = data.loc[mask1, 'Age'].median()
print (med)
40.0#repalce NaNs
data.loc[mask1, 'Age'] = data.loc[mask1, 'Age'].fillna(med)
print (data)Age  Pclass     Sex  a
0  40.0       1  female  a
1  20.0       2  female  a
2  30.0       1    male  a
3  20.0       2  female  d
4  40.0       1  female  d
5   NaN       1    male  d

What is same as:

mask2 = mask1 &(data['Age'].isnull())data.loc[mask2, 'Age'] = med
print (data)Age  Pclass     Sex  a
0  40.0       1  female  a
1  20.0       2  female  a
2  30.0       1    male  a
3  20.0       2  female  d
4  40.0       1  female  d
5   NaN       1    male  d

EDIT:

If need replace all groups NaNs by median:

data['Age'] = data.groupby(["Sex","Pclass"])["Age"].apply(lambda x: x.fillna(x.median()))
print (data)Age  Pclass     Sex  a
0  40.0       1  female  a
1  20.0       2  female  a
2  30.0       1    male  a
3  20.0       2  female  d
4  40.0       1  female  d
5  30.0       1    male  d
https://en.xdnf.cn/q/72271.html

Related Q&A

Pythons difflib SequenceMatcher speed up

Im using difflib SequenceMatcher (ratio() method) to define similarity between text files. While difflib is relatively fast to compare a small set of text files e.g. 10 files of 70 kb on average compar…

create an asymmetric colormap

I am creating a colormap to map colors in a folium choropleth map, using code from here:from branca.colormap import linearcolormap = linear.RdBu.scale(df.MyValue.min(),df.MyValue.max())colormapAs you c…

NLTK - Get and Simplify List of Tags

Im using the Brown Corpus. I want some way to print out all the possible tags and their names (not just tag abbreviations). There are also quite a few tags, is there a way to simplify the tags? By sim…

PolynomialFeatures object has no attribute predict

I want to apply k-fold cross validation on the following regression models:Linear Regression Polynomial Regression Support Vector Regression Decision Tree Regression Random Forest RegressionI am able t…

Error module object has no attribute freetype

I am using this code Link but it displays error of module object has no attribute i tried to pip install freetype but nothing happened. Can anyone please guide me with this.import cv2 import numpy as …

Count total number of white pixels in an image

I am trying to count total number of white pixels in the following image:But with my code, I get this errorsrc is not a numpy array, neither a scalar.This is my code: img=cv2.imread(filename,1) TP= wid…

Pass a JSON object to an url with requests

So, I want to use Kenneth excellent requests module. Stumbled up this problem while trying to use the Freebase API.Basically, their API looks like that:https://www.googleapis.com/freebase/v1/mqlread?q…

jenkinsapi python - how to trigger and track the job result

I am using JenkinsAPI to trigger parametrized jobs. I am aware of the REST API that Jenkins use, but our setup does not allow that directly; so the main mean for me to trigger jobs is through this libr…

Django test parallel AppRegistryNotReady

I am trying to understand how to run django tests in parallel with in memory sqlite3.I have django app with that structure:gbookorder...tests__init__.pytest_a1.pytest_b1.pyutils.pytest_a1.py and test_b…

ImportError: PyCapsule_Import could not import module pyexpat

I am using Jenkins to build a python (Flask) solution to deploy to Google App Engine. As part of the build process I run a few integration tests. One of them is failing with the following error. ERROR:…