Here is a simple DataFrame:
> df = pd.DataFrame({'a': ['a1', 'a2', 'a3'],'b': ['optional1', None, 'optional3'],'c': ['c1', 'c2', 'c3'],'d': [1, 2, 3]})
> dfa b c d
0 a1 optional1 c1 1
1 a2 None c2 2
2 a3 optional3 c3 3
Pivot method 1
The data can be pivoted to this:
> df.pivot_table(index=['a','b'], columns='c')d
c c1 c3
a b
a1 optional1 1.0 NaN
a3 optional3 NaN 3.0
Downside: data in the 2nd row is lost because df['b'][1] == None
.
Pivot method 2
> df.pivot_table(index=['a'], columns='c')d
c c1 c2 c3
a
a1 1.0 NaN NaN
a2 NaN 2.0 NaN
a3 NaN NaN 3.0
Downside: column b
is lost.
How can the two methods be combined so that columns b
and the 2nd row are kept like so:
d
c c1 c2 c3
a b
a1 optional1 1.0 NaN NaN
a2 None NaN 2.0 NaN
a3 optional3 NaN NaN 3.0
More generally: How can information from a row be retained during pivoting if a key has NaN
value?