Pivot a dataframe with duplicate values in Index

2024/10/11 5:24:28

I have a pandas dataframe like this

    snapDate     instance   waitEvent                   AvgWaitInMs
0   2015-Jul-03  XX         gc cr block 3-way               1
1   2015-Jun-29  YY         gc current block 3-way          2
2   2015-Jul-03  YY         gc current block 3-way          1
3   2015-Jun-29  XX         gc current block 3-way          2
4   2015-Jul-01  XX         gc current block 3-way          2
5   2015-Jul-01  YY         gc current block 3-way          2
6   2015-Jul-03  XX         gc current block 3-way          2
7   2015-Jul-03  YY         log file sync                   9
8   2015-Jun-29  XX         log file sync                   8
9   2015-Jul-03  XX         log file sync                   8
10  2015-Jul-01  XX         log file sync                   8
11  2015-Jul-01  YY         log file sync                   9
12  2015-Jun-29  YY         log file sync                   8

I need to transform this to

snapDate        instance    gc cr block 3-way    gc current block 3-way  log file sync  
2015-Jul-03       XX              1                      Na                  8
2015-Jun-29       YY              Na                     2                   8 
2015-Jul-03       YY              Na                     1                   9
...

I tried pivot but it returns an error dfWaits.pivot(index = 'snapDate', columns = 'waitEvent', values = 'AvgWaitInMs') Index contains duplicate entries, cannot reshape

The result should be another dataFrame

Answer

You can also use pivot_table:

df.pivot_table(index=['snapDate','instance'], columns='waitEvent', values='AvgWaitInMs')Out[64]:
waitEvent             gc cr block 3-way  gc current block 3-way  log file sync
snapDate    instance
2015-Jul-01 XX                      NaN                       2              8YY                      NaN                       2              9
2015-Jul-03 XX                        1                       2              8YY                      NaN                       1              9
2015-Jun-29 XX                      NaN                       2              8YY                      NaN                       2              8

Data:

I used the following txt file as input (with read_csv from pandas to get the data.frame):

snapDate;instance;waitEvent;AvgWaitInMs
0;2015-Jul-03;XX;gc cr block 3-way;1
1;2015-Jun-29;YY;gc current block 3-way;2
2;2015-Jul-03;YY;gc current block 3-way;1
3;2015-Jun-29;XX;gc current block 3-way;2
4;2015-Jul-01;XX;gc current block 3-way;2
5;2015-Jul-01;YY;gc current block 3-way;2
6;2015-Jul-03;XX;gc current block 3-way;2
7;2015-Jul-03;YY;log file sync;9
8;2015-Jun-29;XX;log file sync;8
9;2015-Jul-03;XX;log file sync;8
10;2015-Jul-01;XX;log file sync;8
11;2015-Jul-01;YY;log file sync;9
12;2015-Jun-29;YY;log file sync;8
https://en.xdnf.cn/q/118364.html

Related Q&A

Concatenate .txt files with same names in different folders with python

I have two folders containing many text files with matching file names. So I am concatenating folder1/file1.txt with folder2.file1.txt. My current code appends data from folder2/file1 to folder2/file1 …

Importing module via another module

In module A, I import module B. Then, in module C, I import module A. In module C, will I be able to use the content of module B implicitly via the import of module A, or will I have to explicitly impo…

Why time.time() gives 0.0?

I have a python program defined by a function myFunc(m,n)Basically, the function contains two for loops.def myFunc(m, n) : for i in range(m) : for j in range(n) : # do it ...return I would like to calc…

Function reads np.array - produces the mean for k nn to number p in np.array

I need to defina a function which reads a numpy array and produces the mean for k nearest points to number p in the array. Example: array= np.array([1, 2, 3, 4, 5, 6, 7, 50, 24, 32, 9, 11, 12, 10]) p= …

How to plot a line over a bar chart

I am trying to plot a line over a bar chart, but when I plotted the line the bar chart disappeared - and so did my x index values. Can anyone help me plot the line and the bar in the same figure? Than…

How to trim spaces between list elements in an f-string? [duplicate]

This question already has answers here:Print all items in a list with a delimiter(8 answers)Closed 2 months ago.I have a string I am formatting and printing lists using f-string and I need to eliminate…

Keeping name and score together while sorting

so I need to sort some high scores into order and here is the code I already have:def sortscores():namelist = []scorelist = []hs = open("hst.txt", "r")hscounter = 0for line in hs:if…

How to put values of pandas dataframe into a for loop in python?

This is a part of a Python API Connection program Here is the DataFrame SampleRegion Sector Brand ID Start Date 7188 US 41 40000 2006-03-06 7189 US 41 40345 2017-11-06 …

Partition Array

Given an array nums of integers and an int k, partition the array (i.e move the elements in nums) such that: All elements < k are moved to the left. All elements >= k are moved to the right Retur…

Tensorflow model accuracy

My model which I have trained on a set of 29K images for 36 classes and validated on 7K images. The model has a training accuracy of 94.59% and validation accuracy of 95.72% It has been created for OCR…