numpy and pandas timedelta error

2024/10/8 10:57:32

In Python I have an array of dates generated (or read from a CSV-file) using pandas, and I want to add one year to each date. I can get it working using pandas but not using numpy. What am I doing wrong? Or is it a bug in either pandas or numpy?

Thanks!

import numpy as np
import pandas as pd
from pandas.tseries.offsets import DateOffset# Generate range of dates using pandas.
dates = pd.date_range('1980-01-01', '2015-01-01')# Add one year using pandas.
dates2 = dates + DateOffset(years=1)# Convert result to numpy. THIS WORKS!
dates2_np = dates2.values# Convert original dates to numpy array.
dates_np = dates.values# Add one year using numpy. THIS FAILS!
dates3 = dates_np + np.timedelta64(1, 'Y')# TypeError: Cannot get a common metadata divisor for NumPy datetime metadata [ns] and [Y] because they have incompatible nonlinear base time units
Answer

Adding np.timedelta64(1, 'Y') to an array of dtype datetime64[ns] does not work because a year does not correspond to a fixed number of nanoseconds. Sometimes a year is 365 days, sometimes 366 days, sometimes there is even an extra leap second. (Note extra leap seconds, such as the one that occurred on 2015-06-30 23:59:60, are not representable as NumPy datetime64s.)

The easiest way I know to add a year to a NumPy datetime64[ns] array is to break it into constituent parts, such as years, months and days, do the computation on integer arrays, and then recompose the datetime64 array:

def year(dates):"Return an array of the years given an array of datetime64s"return dates.astype('M8[Y]').astype('i8') + 1970def month(dates):"Return an array of the months given an array of datetime64s"return dates.astype('M8[M]').astype('i8') % 12 + 1def day(dates):"Return an array of the days of the month given an array of datetime64s"return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):years = np.asarray(years) - 1970months = np.asarray(months) - 1days = np.asarray(days) - 1types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]','<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')vals = (years, months, days, weeks, hours, minutes, seconds,milliseconds, microseconds, nanoseconds)return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)if v is not None)# break the datetime64 array into constituent parts
years, months, days = [f(dates_np) for f in (year, month, day)]
# recompose the datetime64 array after adding 1 to the years
dates3 = combine64(years+1, months, days)

yields

In [185]: dates3
Out[185]: 
array(['1981-01-01', '1981-01-02', '1981-01-03', ..., '2015-12-30','2015-12-31', '2016-01-01'], dtype='datetime64[D]')

Despite appearing to be so much code, it is actually quicker than adding a DateOffset of 1 year:

In [206]: %timeit dates + DateOffset(years=1)
1 loops, best of 3: 285 ms per loopIn [207]: %%timeit.....: years, months, days = [f(dates_np) for f in (year, month, day)].....: combine64(years+1, months, days).....: 
100 loops, best of 3: 2.65 ms per loop

Of course, pd.tseries.offsets offers a whole panoply of offsets that have no easy counterpart when working with NumPy datetime64s.

https://en.xdnf.cn/q/70128.html

Related Q&A

Pandas - split large excel file

I have an excel file with about 500,000 rows and I want to split it to several excel file, each with 50,000 rows.I want to do it with pandas so it will be the quickest and easiest.any ideas how to make…

Unable to verify secret hash for client at REFRESH_TOKEN_AUTH

Problem"Unable to verify secret hash for client ..." at REFRESH_TOKEN_AUTH auth flow. {"Error": {"Code": "NotAuthorizedException","Message": "Unab…

save a dependecy graph in python

I am using in python3 the stanford dependency parser to parse a sentence, which returns a dependency graph. import pickle from nltk.parse.stanford import StanfordDependencyParserparser = StanfordDepend…

What are the specific rules for constant folding?

I just realized that CPython seems to treat constant expressions, which represent the same value, differently with respect to constant folding. For example:>>> import dis >>> dis.dis(…

installing opencv for python on mavericks

I am trying to install opencv on a Macbook Pro late 2013 with mavericks. I didnt find any binaries so I am trying to build it. I tried http://www.guidefreitas.com/installing-opencv-2-4-2-on-mac-osx-mou…

Python 3 reading CSV file with line breaks in rows

I have a large CSV file with one column and line breaks in some of its rows. I want to read the content of each cell and write it to a text file but the CSV reader is splitting the cells with line brea…

Python appending dictionary, TypeError: unhashable type?

abc = {} abc[int: anotherint]Then the error came up. TypeError: unhashable type? Why I received this? Ive tried str()

Calling C# code within Python3.6

with absolutely no knowledge of coding in C#, I wish to call a C# function within my python code. I know theres quite a lot of Q&As around the same problem, but for some strange reason, im unable t…

Pycharm 3.4.1 - AppRegistryNotReady: Models arent loaded yet. Django Rest framewrok

Im using DRF and Pycharm 3.4.1 and Django 1.7. When I try to test my serializer class via Pycharm django console, it gives me the following error:Codefrom items_app.serializers import ItemSerializer s …

Pass Flask route parameters into a decorator

I have written a decorator that attempts to check we have post data for a Flask POST route:Heres my decorator:def require_post_data(required_fields=None):def decorator(f):@wraps(f)def decorated_functio…