How to use statsmodels.tsa.seasonal.seasonal_decompose with a pandas dataframe

2024/11/14 15:28:45
from statsmodels.tsa.seasonal import seasonal_decomposedef seasonal_decomp(df, model="additive"):seasonal_df = Noneseasonal_df = seasonal_decompose(df, model='additive')return seasonal_dfseasonal_decomp(df)

Error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-93-00543113a58a> in <module>
----> 1 seasonal_decompose(df, model='additive')e:\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)197                 else:198                     kwargs[new_arg_name] = new_arg_value
--> 199             return func(*args, **kwargs)200 201         return cast(F, wrapper)e:\Anaconda3\lib\site-packages\statsmodels\tsa\seasonal.py in seasonal_decompose(x, model, filt, period, two_sided, extrapolate_trend)185     for s, name in zip((seasonal, trend, resid, x),186                        ('seasonal', 'trend', 'resid', None)):
--> 187         results.append(pw.wrap(s.squeeze(), columns=name))188     return DecomposeResult(seasonal=results[0], trend=results[1],189                            resid=results[2], observed=results[3])e:\Anaconda3\lib\site-packages\statsmodels\tools\validation\validation.py in wrap(self, obj, columns, append, trim_start, trim_end)216                     new.append(append if c is None else str(c) + '_' + append)217                 columns = new
--> 218             return pd.DataFrame(obj, columns=columns, index=index)219         else:220             raise ValueError('Can only wrap 1 or 2-d array_like')e:\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)495                 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)496             else:
--> 497                 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)498 499         # For data is list-like, or Iterable (will consume into list)e:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)201 202     # _prep_ndarray ensures that values.ndim == 2 at this point
--> 203     index, columns = _get_axes(204         values.shape[0], values.shape[1], index=index, columns=columns205     )e:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in _get_axes(N, K, index, columns)460         columns = ibase.default_index(K)461     else:
--> 462         columns = ensure_index(columns)463     return index, columns464 e:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in ensure_index(index_like, copy)5612             index_like = copy_func(index_like)5613 
-> 5614     return Index(index_like)5615 5616 e:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __new__(cls, data, dtype, copy, name, tupleize_cols, **kwargs)409 410         elif data is None or is_scalar(data):
--> 411             raise cls._scalar_data_error(data)412         elif hasattr(data, "__array__"):413             return Index(np.asarray(data), dtype=dtype, copy=copy, name=name, **kwargs)TypeError: Index(...) must be called with a collection of some kind, 'seasonal' was passed

Test Data

df = pd.DataFrame.from_dict(data, orient='index')data = {pd.Timestamp('2020-01-23 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-01-24 00:00:00'): {'LA': 1.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-01-25 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-01-26 00:00:00'): {'LA': 3.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-01-27 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-01-28 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-01-29 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-01-30 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 1.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-01-31 00:00:00'): {'LA': 2.0,'NY': 0.0,'Miami': 0.0,'Seattle': 2.0,'San Diego': 0.0},pd.Timestamp('2020-02-01 00:00:00'): {'LA': 1.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-02 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 1.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-03 00:00:00'): {'LA': 3.0,'NY': 0.0,'Miami': 1.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-04 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-05 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-06 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-07 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-08 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-09 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-10 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-11 00:00:00'): {'LA': 1.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-12 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-13 00:00:00'): {'LA': 1.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-14 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-15 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-16 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-17 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-18 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-19 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-20 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-21 00:00:00'): {'LA': 2.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-22 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-23 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-24 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-25 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-26 00:00:00'): {'LA': 0.0,'NY': 1.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-27 00:00:00'): {'LA': 1.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-28 00:00:00'): {'LA': 0.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-02-29 00:00:00'): {'LA': 8.0,'NY': 1.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-03-01 00:00:00'): {'LA': 6.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-03-02 00:00:00'): {'LA': 23.0,'NY': 0.0,'Miami': 2.0,'Seattle': 1.0,'San Diego': 0.0},pd.Timestamp('2020-03-03 00:00:00'): {'LA': 20.0,'NY': 0.0,'Miami': 0.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-03-04 00:00:00'): {'LA': 31.0,'NY': 2.0,'Miami': 23.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-03-05 00:00:00'): {'LA': 70.0,'NY': 0.0,'Miami': 2.0,'Seattle': 1.0,'San Diego': 1.0},pd.Timestamp('2020-03-06 00:00:00'): {'LA': 48.0,'NY': 9.0,'Miami': 1.0,'Seattle': 9.0,'San Diego': 0.0},pd.Timestamp('2020-03-07 00:00:00'): {'LA': 115.0,'NY': 0.0,'Miami': 3.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-03-08 00:00:00'): {'LA': 114.0,'NY': 7.0,'Miami': 5.0,'Seattle': 4.0,'San Diego': 2.0},pd.Timestamp('2020-03-09 00:00:00'): {'LA': 68.0,'NY': 5.0,'Miami': 4.0,'Seattle': 0.0,'San Diego': 0.0},pd.Timestamp('2020-03-10 00:00:00'): {'LA': 192.0,'NY': 6.0,'Miami': 13.0,'Seattle': 3.0,'San Diego': 4.0},pd.Timestamp('2020-03-11 00:00:00'): {'LA': 398.0,'NY': 7.0,'Miami': 6.0,'Seattle': 0.0,'San Diego': 6.0},pd.Timestamp('2020-03-12 00:00:00'): {'LA': 452.0,'NY': 14.0,'Miami': 11.0,'Seattle': 8.0,'San Diego': 4.0},pd.Timestamp('2020-03-13 00:00:00'): {'LA': 596.0,'NY': 99.0,'Miami': 9.0,'Seattle': 17.0,'San Diego': 7.0},pd.Timestamp('2020-03-14 00:00:00'): {'LA': 713.0,'NY': 0.0,'Miami': 20.0,'Seattle': 14.0,'San Diego': 14.0},pd.Timestamp('2020-03-15 00:00:00'): {'LA': 98.0,'NY': 11.0,'Miami': 11.0,'Seattle': 4.0,'San Diego': 13.0},pd.Timestamp('2020-03-16 00:00:00'): {'LA': 1392.0,'NY': 38.0,'Miami': 6.0,'Seattle': 27.0,'San Diego': 11.0},pd.Timestamp('2020-03-17 00:00:00'): {'LA': 1781.0,'NY': 121.0,'Miami': 23.0,'Seattle': 24.0,'San Diego': 0.0},pd.Timestamp('2020-03-18 00:00:00'): {'LA': 2776.0,'NY': 51.0,'Miami': 14.0,'Seattle': 33.0,'San Diego': 54.0},pd.Timestamp('2020-03-19 00:00:00'): {'LA': 5240.0,'NY': 249.0,'Miami': 38.0,'Seattle': 52.0,'San Diego': 34.0},pd.Timestamp('2020-03-20 00:00:00'): {'LA': 5322.0,'NY': 172.0,'Miami': 50.0,'Seattle': 54.0,'San Diego': 52.0},pd.Timestamp('2020-03-21 00:00:00'): {'LA': 6346.0,'NY': 228.0,'Miami': 86.0,'Seattle': 53.0,'San Diego': 38.0},pd.Timestamp('2020-03-22 00:00:00'): {'LA': 7936.0,'NY': 525.0,'Miami': 66.0,'Seattle': 61.0,'San Diego': 34.0}}
Answer
  • The issue is here, seasonal_decompose(df, model='additive'), the entire dataframe is being passed to seasonal_decompose, but you may only pass one column, and a datetime index.
  • The function has been updated to use a list comprehension to calculate the .trend for each column, and then combine the data into a single dataframe with pandas.concat.
from statsmodels.tsa.seasonal import seasonal_decompose
import pandas as pd# dataframe from sample; in this case the index is already a datetime
df = pd.DataFrame.from_dict(data, orient='index')# if the index is not a datetime format
df.index = pd.to_datetime(df.index)# perform seasonal decompose in a list comprehension on each column, return dataframe
def season_decom(df, model='additive'):    return pd.concat([pd.DataFrame({col: seasonal_decompose(df[col], model=model).trend}) for col in df.columns], axis=1)# call function
df_seasonal = season_decom(df)# df_seasonal.head()LA   NY     Miami   Seattle  San Diego
2020-01-23       NaN  NaN       NaN       NaN        NaN
2020-01-24       NaN  NaN       NaN       NaN        NaN
2020-01-25       NaN  NaN       NaN       NaN        NaN
2020-01-26  0.571429  0.0  0.000000  0.000000        0.0
2020-01-27  0.571429  0.0  0.142857  0.000000        0.0
2020-01-28  0.714286  0.0  0.142857  0.285714        0.0
2020-01-29  0.857143  0.0  0.142857  0.285714        0.0
2020-01-30  0.428571  0.0  0.285714  0.285714        0.0
2020-01-31  0.857143  0.0  0.428571  0.285714        0.0
2020-02-01  0.857143  0.0  0.428571  0.285714        0.0

Simplified Version

  • apply seasonal_decompose to each column with .apply
df_seasonal = df.apply(lambda x: seasonal_decompose(x, model='additive').trend)
https://en.xdnf.cn/q/72028.html

Related Q&A

UnidentifiedImageError: cannot identify image file

Hello I am training a model with TensorFlow and Keras, and the dataset was downloaded from https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765 This is a zip folder that I split in the …

Pydub raw audio data

Im using Pydub in Python 3.4 to try to detect the pitch of some audio files.I have a working pitch detection algorithm (McLeod Pitch Method), which is robust for real-time applications (I even made an …

Create Duplicate Rows and Change Values in Specific Columns

How to create x amount of duplicates based on a row in the dataframe and change a single or multi variables from specific columns. The rows are then added to the end of the same dataframe.A B C D E F 0…

writing and saving CSV file from scraping data using python and Beautifulsoup4

I am trying to scrape data from the PGA.com website to get a table of all of the golf courses in the United States. In my CSV table I want to include the Name of the golf course ,Address ,Ownership ,We…

Performance issue turning rows with start - end into a dataframe with TimeIndex

I have a large dataset where each line represents the value of a certain type (think a sensor) for a time interval (between start and end). It looks like this: start end type value 2015-01-01…

How can I create a key using RSA/ECB/PKCS1Padding in python?

I am struggling to find any method of using RSA in ECB mode with PKCS1 padding in python. Ive looked into pyCrypto, but they dont have PKCS1 padding in the master branch (but do in a patch). Neverthel…

Do full-outer-join with pandas.merge_asof

Hi I need to align some time series data with nearest timestamps, so I think pandas.merge_asof could be a good candidate. However, it does not have an option to set how=outer like in the standard merge…

order of calling constructors in Python

#!/usr/bin/pythonclass Parent(object): # define parent classparentAttr = 100def __init__(self):print "Calling parent constructor"def parentMethod(self):print Calling parent methoddef s…

How do I access data from a python thread

I have a very simple threading example using Python 3.4.2. In this example I am creating a five threads that just returns the character string "Result" and appends it to an array titled thre…

How to tell if a full-screen application is running?

Is it possible in python to tell if a full screen application on linux is running? I have a feeling it might be possible using Xlib but I havent found a way.EDIT: By full screen I mean the WHOLE scree…