Adding a calculated column to pandas dataframe

2024/9/23 4:34:04

I am completely new to Python, pandas and programming in general, and I cannot figure out the following:

I have accessed a database with the help of pandas and I have put the data from the query into a dataframe, df. One of the column contains birthdays, which can have the following forms:- 01/25/1980 (string)- 01/25 (string)- None (NoneType)

Now, I would like to add a new column to df, which stores the ages of the people in the database. So I have done the following:

def addAge(df):today = date.today()df["age"] = Nonefor index, row in df.iterrows():if row["birthday"] != None:if len(row["birthday"]) == 10:birthday = df["birthday"]birthdayDate = datetime.date(int(birthday[6:]), int(birthday[:2]), int(birthday[3:5])) row["age"] = today.year - birthdayDate.year - ((today.month, today.day) < (birthdayDate.month, birthdayDate.day))print row["birthday"], row["age"]  #this is just for testingaddAge(df)
print df

The line print row["birthday"], row["age"] correctly prints the birthdays and the ages. But when I call print df, the column age always contains "None". Could you guys explain to me what I have been doing wrong? Thanks!

Answer

When you call iterrows() you are getting copies of each row and cannot assign back to the larger dataframe. In general, you should be trying to using vectorized methods, rather than iterating over the rows.

So for example in this case, to parse the 'birthday' column, you could do something like this: For the rows that have a length of 10, the string will parsed into a datetime, otherwise it will be filled with a missing value.

import numpy as np
import pandas as pd
df['birthday'] = np.where(df['birthday'].str.len() == 10, pd.to_datetime(df['birthday']), '')

To calculate the ages, you can use .apply, which applies a function over each row of a series.
So if you wrapped your age calculation in a function:

def calculate_age(birthdayDate, today):if pd.isnull(birthdayDate):return np.nanelse:return today.year - birthdayDate.year - ((today.month, today.day) < (birthdayDate.month, birthdayDate.day))

Then, you could calculate the age column like this:

today = date.today()
df['age'] = df['birthday'].apply(lambda x: calculate_age(x, today))
https://en.xdnf.cn/q/71867.html

Related Q&A

Scipy: Centroid of convex hull

how can I calculate the centroid of a convex hull using python and scipy? All I found are methods for computing Area and Volume.regards,frank.

Creating a montage of pictures in python

I have no experience with python, but the owner of this script is not responding.When I drag my photos over this script, to create a montage, it ends up cutting off half of the last photo on the right …

stop python program when ssh pipe is broken

Im writing a python script with an infinite while loop that I am running over ssh. I would like the script to terminate when someone kills ssh. For example:The script (script.py):while True:# do someth…

How do I export a TensorFlow model as a .tflite file?

Background information:I have written a TensorFlow model very similar to the premade iris classification model provided by TensorFlow. The differences are relatively minor: I am classifying football ex…

Using plotly in Jupyter to create animated chart in off-line mode

Ive been trying to get the "Filled-Area Animation in Python" example to work using plotly in offline mode in a Jupyter notebook. The example can be found here: https://plot.ly/python/filled-a…

Django: How to unit test Update Views/Forms

Im trying to unit test my update forms and views. Im using Django Crispy Forms for both my Create and Update Forms. UpdateForm inherits CreateForm and makes a small change to the submit button text. Th…

Why is Python faster than C++ in this case?

A program in both Python and C++ is given below, which performs the following task: read white-space delimited words from stdin, print the unique words sorted by string length along with a count of eac…

Python - write headers to csv

Currently i am writing query in python which export data from oracle dbo to .csv file. I am not sure how to write headers within file. try:connection = cx_Oracle.connect(user,pass,tns_name)cursor = con…

Opening/Attempting to Read a file [duplicate]

This question already has answers here:PyCharm shows unresolved references error for valid code(31 answers)Closed 5 years ago.I tried to simply read and store the contents of a text file into an array,…

How to pass custom settings through CrawlerProcess in scrapy?

I have two CrawlerProcesses, each is calling different spider. I want to pass custom settings to one of these processes to save the output of the spider to csv, I thought I could do this:storage_setti…