Pandas-Add missing years in time series data with duplicate years

2024/10/15 15:25:13

I have a dataset like this where data for some years are missing .

County Year Pop
12     1999 1.1
12     2001 1.2
13     1999 1.0
13     2000 1.1

I want something like

County Year Pop
12     1999 1.1
12     2000 NaN
12     2001 1.2
13     1999 1.0
13     2000 1.1
13     2001 nan

I have tried setting index to year and then using reindex with another dataframe of just years method (mentioned here Pandas: Add data for missing months) but it gives me error cant reindex with duplicate values. I have also tried df.loc but it has same issue. I even tried a full outer join with blank df of just years but that also didnt work.

How can I solve this?

Answer

Make a MultiIndex so you don't have duplicates:

df.set_index(['County', 'Year'], inplace=True)

Then construct a full MultiIndex with all the combinations:

index = pd.MultiIndex.from_product(df.index.levels)

Then reindex:

df.reindex(index)

The construction of the MultiIndex is untested and may need a little tweaking (e.g. if a year is entirely absent from all counties), but I think you get the idea.

https://en.xdnf.cn/q/69271.html

Related Q&A

Saving zip list to csv in Python

How I can write below zip list to csv file in python?[{date: 2015/01/01 00:00, v: 96.5},{date: 2015/01/01 00:01, v: 97.0},{date: 2015/01/01 00:02, v: 93.75},{date: 2015/01/01 00:03, v: 96.0},{date: 20…

unable to download the pipeline provided by spark-nlp library

i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library i tried installing different versions of pyspark and spark-nlp libraryimport sparknlp from…

Can __setattr__() can be defined in a class with __slots__?

Say I have a class which defines __slots__:class Foo(object):__slots__ = [x]def __init__(self, x=1):self.x = x# will the following work?def __setattr__(self, key, value):if key == x:object.__setattr__…

mysql-connector python IN operator stored as list

I am using mysql-connector with python and have a query like this:SELECT avg(downloadtime) FROM tb_npp where date(date) between %s and %s and host like %s",(s_date,e_date,"%" + dc + &quo…

Pandas: Use iterrows on Dataframe subset

What is the best way to do iterrows with a subset of a DataFrame?Lets take the following simple example:import pandas as pddf = pd.DataFrame({Product: list(AAAABBAA),Quantity: [5,2,5,10,1,5,2,3],Start…

Can I parameterize a pytest fixture with other fixtures?

I have a python test that uses a fixture for credentials (a tuple of userid and password)def test_something(credentials)(userid, password) = credentialsprint("Hello {0}, welcome to my test".f…

fit method in python sklearn

I am asking myself various questions about the fit method in sklearn.Question 1: when I do:from sklearn.decomposition import TruncatedSVD model = TruncatedSVD() svd_1 = model.fit(X1) svd_2 = model.fit(…

Django 1.9 JSONField update behavior

Ive recently updated to Django 1.9 and tried updating some of my model fields to use the built-in JSONField (Im using PostgreSQL 9.4.5). As I was trying to create and update my objects fields, I came a…

Using Tweepy to search for tweets with API 1.1

Ive been trying to get tweepy to search for a sring without success for the past 3 hours. I keep getting replied it should use api 1.1. I thought that was implemented... because I can post with tweepy.…

Retrieving my own data via FaceBook API

I am building a website for a comedy group which uses Facebook as one of their marketing platforms; one of the requirements for the new site is to display all of their Facebook events on a calendar.Cur…