Pandas split name column into first and last name if contains one space

2024/9/20 16:58:41

Let's say I have a pandas DataFrame containing names like so:

name_df = pd.DataFrame({'name':['Jack Fine','Kim Q. Danger','Jane Smith', 'Juan de la Cruz']})

    name
0   Jack Fine
1   Kim Q. Danger
2   Jane Smith
3   Juan de la Cruz

and I want to split the name column into first_name and last_name IF there is one space in the name. Otherwise I want the full name to be shoved into first_name.

So the final DataFrame should look like:

  first_name     last_name
0 Jack           Fine
1 Kim Q. Danger
2 Jane           Smith
3 Juan de la Cruz

I've tried to accomplish this by first applying the following function to return names that can be split into first and last name:

def validate_single_space_name(name: str) -> str:pattern = re.compile(r'^.*( ){1}.*$')match_obj = re.match(pattern, name)if match_obj:return nameelse:return None

However applying this function to my original name_df, leads to an empty DataFrame, not one populated by names that can be split and Nones.

Help getting my current approach to work, or solutions invovling a different approach would be appreciated!

Answer

You can use str.split to split the strings, then test the number of splits using str.len and use this as a boolean mask to assign just those rows with the last component of the split:

In [33]:
df.loc[df['name'].str.split().str.len() == 2, 'last name'] = df['name'].str.split().str[-1]
dfOut[33]:name last name
0        Jack Fine      Fine
1    Kim Q. Danger       NaN
2       Jane Smith     Smith
3  Juan de la Cruz       NaN

EDIT

You can call split with param expand=True this will only populate where the name lengths are exactly 2 names:

In [16]:
name_df[['first_name','last_name']] = name_df['name'].loc[name_df['name'].str.split().str.len() == 2].str.split(expand=True)
name_dfOut[16]:name first_name last_name
0        Jack Fine       Jack      Fine
1    Kim Q. Danger        NaN       NaN
2       Jane Smith       Jane     Smith
3  Juan de la Cruz        NaN       NaN

You can then replace the missing first names using fillna:

In [17]:
name_df['first_name'].fillna(name_df['name'],inplace=True)
name_df
​
Out[17]:name       first_name last_name
0        Jack Fine             Jack      Fine
1    Kim Q. Danger    Kim Q. Danger       NaN
2       Jane Smith             Jane     Smith
3  Juan de la Cruz  Juan de la Cruz       NaN
https://en.xdnf.cn/q/72150.html

Related Q&A

Docker. No such file or directory

I have some files which I want to move them to a docker container. But at the end docker cant find a file..The folder with the files on local machine are at /home/katalonne/flask4File Structure if it m…

How to recover original values after a model predict in keras?

This is a more conceptual question, but I have to confess I have been dealing with it for a while. Suppose you want to train a neural network (NN), using for instance keras. As it is recommended you pe…

Find closest line to each point on big dataset, possibly using shapely and rtree

I have a simplified map of a city that has streets in it as linestrings and addresses as points. I need to find closest path from each point to any street line. I have a working script that does this, …

Reading pretty print json files in Apache Spark

I have a lot of json files in my S3 bucket and I want to be able to read them and query those files. The problem is they are pretty printed. One json file has just one massive dictionary but its not in…

Visualize TFLite graph and get intermediate values of a particular node?

I was wondering if there is a way to know the list of inputs and outputs for a particular node in tflite? I know that I can get input/outputs details, but this does not allow me to reconstruct the com…

Why do I get a pymongo.cursor.Cursor when trying to query my mongodb db via pymongo?

I have consumed a bunch of tweets in a mongodb database. I would like to query these tweets using pymongo. For example, I would like to query for screen_name. However, when I try to do this, python doe…

using dropbox as a server for my django app

I dont know if at all i make any sense, but this popped up in my mind. Can we use the 2gb free hosting of dropbox to put our django app over there and do some hacks to run our app?

Proper overloading of json encoding and decoding with Flask

I am trying to add some overloading to the Flask JSON encoder/decoder to add datetime encoding/decoding but only succeeded through a hack.from flask import Flask, flash, url_for, redirect, render_templ…

How to check a specific type of tuple or list?

Suppose, var = (x, 3)How to check if a variable is a tuple with only two elements, first being a type str and the other a type int in python? Can we do this using only one check? I want to avoid this…

Cannot import name BlockBlobService

I got the following error:from azure.storage.blob import BlockBlobService ImportError: cannot import name BlockBlobServicewhen trying to run my python project using command prompt. (The code seems to…