Question 1

Let's say I have a pandas DataFrame containing names like so:

name_df = pd.DataFrame({'name':['Jack Fine','Kim Q. Danger','Jane Smith', 'Juan de la Cruz']})

    name
0   Jack Fine
1   Kim Q. Danger
2   Jane Smith
3   Juan de la Cruz

and I want to split the name column into first_name and last_name IF there is one space in the name. Otherwise I want the full name to be shoved into first_name.

So the final DataFrame should look like:

  first_name     last_name
0 Jack           Fine
1 Kim Q. Danger
2 Jane           Smith
3 Juan de la Cruz

I've tried to accomplish this by first applying the following function to return names that can be split into first and last name:

def validate_single_space_name(name: str) -> str:pattern = re.compile(r'^.*( ){1}.*$')match_obj = re.match(pattern, name)if match_obj:return nameelse:return None

However applying this function to my original name_df, leads to an empty DataFrame, not one populated by names that can be split and Nones.

Help getting my current approach to work, or solutions invovling a different approach would be appreciated!

Question 2

You can use str.split to split the strings, then test the number of splits using str.len and use this as a boolean mask to assign just those rows with the last component of the split:

In [33]:
df.loc[df['name'].str.split().str.len() == 2, 'last name'] = df['name'].str.split().str[-1]
dfOut[33]:name last name
0        Jack Fine      Fine
1    Kim Q. Danger       NaN
2       Jane Smith     Smith
3  Juan de la Cruz       NaN

EDIT

You can call split with param expand=True this will only populate where the name lengths are exactly 2 names:

In [16]:
name_df[['first_name','last_name']] = name_df['name'].loc[name_df['name'].str.split().str.len() == 2].str.split(expand=True)
name_dfOut[16]:name first_name last_name
0        Jack Fine       Jack      Fine
1    Kim Q. Danger        NaN       NaN
2       Jane Smith       Jane     Smith
3  Juan de la Cruz        NaN       NaN

You can then replace the missing first names using fillna:

In [17]:
name_df['first_name'].fillna(name_df['name'],inplace=True)
name_df

Out[17]:name       first_name last_name
0        Jack Fine             Jack      Fine
1    Kim Q. Danger    Kim Q. Danger       NaN
2       Jane Smith             Jane     Smith
3  Juan de la Cruz  Juan de la Cruz       NaN

Pandas split name column into first and last name if contains one space

Related Q&A

Docker. No such file or directory

How to recover original values after a model predict in keras?

Find closest line to each point on big dataset, possibly using shapely and rtree

Reading pretty print json files in Apache Spark

Visualize TFLite graph and get intermediate values of a particular node?

Why do I get a pymongo.cursor.Cursor when trying to query my mongodb db via pymongo?

using dropbox as a server for my django app

Proper overloading of json encoding and decoding with Flask

How to check a specific type of tuple or list?

Cannot import name BlockBlobService