Reindex 2nd level in incomplete multi-level dataframe to be complete, inserting NANs on missing rows

2024/9/22 8:30:35

I need to reindex the 2nd level of a pandas dataframe, so that the 2nd level becomes a (complete) list 0,...,(N-1) for each 1st level index.

  • I tried using Allan/Hayden's approach, but unfortunately it only creates an index with as many rows as previously existing.
  • What I want is that for each new index, new rows are inserted (with nan values).

Example:

df = pd.DataFrame({'first': ['one', 'one', 'one', 'two', 'two', 'three'], 'second': [0, 1, 2, 0, 1, 1],'value': [1, 2, 3, 4, 5, 6]
})
print dffirst  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6# Tried using Allan/Hayden's approach, but no good for this, doesn't add the missing rows    
df['second'] = df.reset_index().groupby(['first']).cumcount()
print dffirst  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       0      6

My desired result is:

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
4    two       2      nan <-- INSERTED
5  three       0      6
5  three       1      nan <-- INSERTED
5  three       2      nan <-- INSERTED
Answer

I think you can first set columns first and second as multi-level index, and then reindex.

# your data
# ==========================
df = pd.DataFrame({'first': ['one', 'one', 'one', 'two', 'two', 'three'], 'second': [0, 1, 2, 0, 1, 1],'value': [1, 2, 3, 4, 5, 6]
})dffirst  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6# processing
# ============================
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second'])df.set_index(['first', 'second']).reindex(multi_index).reset_index()first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5    two       2    NaN
6  three       0    NaN
7  three       1      6
8  three       2    NaN
https://en.xdnf.cn/q/71969.html

Related Q&A

ImportError: cannot import name _gdal_array from osgeo

I create a fresh environment, install numpy, then install GDAL. GDAL imports successfully and I can open images using gdal.Open(, but I get the ImportError: cannot import name _gdal_array from osgeo er…

How do I insert a map into DynamoDB table?

I have the following line of code :table.put_item( Item={filename : key, status : {M : iocheckdict }})The iocheckdict looks like this:{A: One, C: Three, D: Four, B: Two, E: Five}So, when I am running t…

How to redirect django.contrib.auth.views.login after login?

I added django.contrib.auth.views.login everywhere in my webpage, for that I had to load a templatetag (that returns the AuthenticationForm) in my base.html. This templatetags includes the registration…

How to do windows API calls in Python 3.1?

Has anyone found a version of pywin32 for python 3.x? The latest available appears to be for 2.6.Alternatively, how would I "roll my own" windows API calls in Python 3.1?

Returning the outputs from a CloudFormation template with Boto?

Im trying to retrieve the list of outputs from a CloudFormation template using Boto. I see in the docs theres an object named boto.cloudformation.stack.Output. But I think this is unimplemented functi…

numpy.array of an I;16 Image file

I want to use TIFF images to effectively save large arrays of measurement data. With setting them to mode="I;16" (corresponding to my 16 bit data range), they yield 2MB files (~1000x1000 &quo…

Namespace packages and pip install -e

I have a ns.pkg2 package that depends on ns.pkg1 package. I make a fork of it, publish it to git and want to install my version into my virtualenv. I use pip install -e mygit and end up with ns.pkg in …

Python sys.argv out of range, dont understand why

I have a script that Ive been using for a some time to easily upload files to my server. It has been working great for a long time, but I cant get it to work on my new desktop computer. The code is sim…

Error calling BashOperator: Bash command failed

Here are my dag file and BashOperator task:my_dag = { dag_id = my_dag, start_date = datetime(year=2017, month=3, day=28), schedule_interval=01***, }my_bash_task = BashOperator( task_id="my_bash_t…

Match unescaped quotes in quoted csv

Ive looked at several of the Stack Overflow posts with similar titles, and none of the accepted answers have done the trick for me.I have a CSV file where each "cell" of data is delimited by …