NaN values in pivot_table index causes loss of data

2024/10/4 5:30:06

Here is a simple DataFrame:

> df = pd.DataFrame({'a': ['a1', 'a2', 'a3'],'b': ['optional1', None, 'optional3'],'c': ['c1', 'c2', 'c3'],'d': [1, 2, 3]})
> dfa          b   c  d
0  a1  optional1  c1  1
1  a2       None  c2  2
2  a3  optional3  c3  3

Pivot method 1

The data can be pivoted to this:

> df.pivot_table(index=['a','b'], columns='c')d     
c              c1   c3
a  b                  
a1 optional1  1.0  NaN
a3 optional3  NaN  3.0

Downside: data in the 2nd row is lost because df['b'][1] == None.

Pivot method 2

> df.pivot_table(index=['a'], columns='c')d          
c    c1   c2   c3
a                
a1  1.0  NaN  NaN
a2  NaN  2.0  NaN
a3  NaN  NaN  3.0

Downside: column b is lost.

How can the two methods be combined so that columns b and the 2nd row are kept like so:

                d     
c              c1   c2   c3
a  b                  
a1 optional1  1.0  NaN  NaN
a2      None  NaN  2.0  NaN
a3 optional3  NaN  NaN  3.0

More generally: How can information from a row be retained during pivoting if a key has NaN value?

Answer

Use set_index and unstack to perform the pivot:

df = df.set_index(['a', 'b', 'c']).unstack('c')

This is essentially what pandas does under the hood for pivot. The stack and unstack methods are closely related to pivot, and can generally be used to perform pivot-like operations that don't quite conform with the built-in pivot functions.

The resulting output:

                d          
c              c1   c2   c3
a  b                       
a1 optional1  1.0  NaN  NaN
a2 NaN        NaN  2.0  NaN
a3 optional3  NaN  NaN  3.0
https://en.xdnf.cn/q/70640.html

Related Q&A

ModuleNotFoundError in Docker

I have imported my entire project into docker, and I am getting a ModuleNotFoundErrorfrom one of the modules I have created.FROM python:3.8 WORKDIR /workspace/ COPY . /workspace/ RUN pip install pipenv…

Can I use md5 authentication with psycopg2?

After two hours of reading documentation, source code and help-threads, Im giving up. I cant get psycopg2 to authenticate with a md5-string. According to this thread I dont have to anything besides ena…

Python checking __init__ parameter

Ive been trying to figuring this out for the last few hours, and Im about to give up.How do you make sure that in python only a matching specific criteria will create the object?For example, lets say …

Minidom getElementById not working

Minidoms getElementById function is returning None for any entry I pass to it.For example, this code:l = minidom.parseString(<node id="node">Node</node>) print(l.getElementById(&q…

Optimization on piecewise linear regression

I am trying to create a piecewise linear regression to minimize the MSE(minimum square errors) then using linear regression directly. The method should be using dynamic programming to calculate the dif…

Python: Check if list of named tuples contains particular attribute value

I have a list of named tuples:from collections import namedtupleT = namedtuple(T, [attr1, attr2, attr3, attr4]) t1 = T(T1, 1, 1234, XYZ) t2 = T(T2, 2, 1254, ABC) t3 = T(T2, 2, 1264, DEF) l = [t1, t2, t…

javascript error: arguments[0].scrollIntoView is not a function using selenium on python

Im using Selenium on python and I would like to scroll to an element to click on it. Everywhere I see that the rigth things to do to go directly to the element is to use :driver = webdriver.Chrome() dr…

Uploading a static project to google app engines

Disclaimer: I already asked here, but apparently off-topic. I want to set up a page using this bootstrap template and host it as a static website using the google appengine service. Inside the google_a…

Python cannot import DataFrame

I am trying to use Pandas in Python to import and manipulate some csv file.my code is like:import pandas as pd from pandas import dataframe data_df = pd.read_csv(highfrequency2.csv) print(data_df.col…

Sum of product of combinations in a list

What is the Pythonic way of summing the product of all combinations in a given list, such as:[1, 2, 3, 4] --> (1 * 2) + (1 * 3) + (1 * 4) + (2 * 3) + (2 * 4) + (3 * 4) = 35(For this example I have t…