Rowwise min() and max() fails for column with NaNs

2024/10/4 13:29:49

I am trying to take the rowwise max (and min) of two columns containing dates

from datetime import date
import pandas as pd
import numpy as np    df = pd.DataFrame({'date_a' : [date(2015, 1, 1), date(2012, 6, 1),date(2013, 1, 1), date(2016, 6, 1)],'date_b' : [date(2012, 7, 1), date(2013, 1, 1), date(2014, 3, 1), date(2013, 4, 1)]})df[['date_a', 'date_b']].max(axis=1)
Out[46]: 
0    2015-01-01
1    2013-01-01
2    2014-03-01
3    2016-06-01

as expected. However, if the dataframe contains a single NaN value, the whole operation fails

df_nan = pd.DataFrame({'date_a' : [date(2015, 1, 1), date(2012, 6, 1),np.NaN, date(2016, 6, 1)],'date_b' : [date(2012, 7, 1), date(2013, 1, 1), date(2014, 3, 1), date(2013, 4, 1)]})df_nan[['date_a', 'date_b']].max(axis=1)
Out[49]: 
0   NaN 
1   NaN
2   NaN
3   NaN
dtype: float64

What is going on here? I was expecting this result

0    2015-01-01
1    2013-01-01
2    NaN
3    2016-06-01

How can this be achieved?

Answer

I would say the best solution is to use the appropriate dtype. Pandas provides a very well integrated datetime dtype. So note, you are using object dtypes...

>>> dfdate_a      date_b
0  2015-01-01  2012-07-01
1  2012-06-01  2013-01-01
2         NaN  2014-03-01
3  2016-06-01  2013-04-01
>>> df.dtypes
date_a    object
date_b    object
dtype: object

But note, the problem disappears when you use

>>> df2 = df.apply(pd.to_datetime)
>>> df2date_a     date_b
0 2015-01-01 2012-07-01
1 2012-06-01 2013-01-01
2        NaT 2014-03-01
3 2016-06-01 2013-04-01
>>> df2.min(axis=1)
0   2012-07-01
1   2012-06-01
2   2014-03-01
3   2013-04-01
dtype: datetime64[ns]
https://en.xdnf.cn/q/70608.html

Related Q&A

Convert column suffixes from pandas join into a MultiIndex

I have two pandas DataFrames with (not necessarily) identical index and column names. >>> df_L = pd.DataFrame({X: [1, 3], Y: [5, 7]})>>> df_R = pd.DataFrame({X: [2, 4], Y: [6, 8]})I c…

sys-package-mgr*: cant create package cache dir when run python script with Jython

I want to run Python script with Jython. the result show correctly, but at the same time there is an warning message, "sys-package-mgr*: cant create package cache dir"How could I solve this p…

Python WWW macro

i need something like iMacros for Python. It would be great to have something like that:browse_to(www.google.com) type_in_input(search, query) click_button(search) list = get_all(<p>)Do you know …

Django custom context_processors in render_to_string method

Im building a function to send email and I need to use a context_processor variable inside the HTML template of the email, but this dont work.Example:def send_email(plain_body_template_name, html_body_…

Using string as variable name

Is there any way for me to use a string to call a method of a class? Heres an example that will hopefully explain better (using the way I think it should be):class helloworld():def world(self):print &…

How to sum all amounts by date in pandas dataframe?

I have dataframe with fields last_payout and amount. I need to sum all amount for each month and plot the output. df[[last_payout,amount]].dtypeslast_payout datetime64[ns] amount float64 d…

Unable to import decimal in Python 2.7 or Python 3.3 [duplicate]

This question already has answers here:Importing a library from (or near) a script with the same name raises "AttributeError: module has no attribute" or an ImportError or NameError(4 answers…

I Get ImportError: No module named pathlib, even after installing pathlib with pip

This is my first time asking on this site, so sorry if my question is not layed out correctlyy@DESKTOP-MQJ3NCT:~/Real-Time-Voice-Cloning$ python demo_toolbox.py Traceback (most recent call last):File &…

Python regex separate space-delimited words into a list

If I have a string = "hello world sample text"I want to be able to convert it to a list = ["hello", "world", "sample", "text"]How can I do that with re…

Naive install of PySpark to also support S3 access

I would like to read Parquet data stored on S3 from PySpark.Ive downloaded spark from here:http://www.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgzAnd installed it to Python naivelycd…