Long to wide data. Pandas

2024/9/26 1:26:54

I'm trying to take my dataframe from a long format in which I have a column with a categorical variable, into a wide format in which each category has it's own price column. Currently, my data looks like this:

date-time            date       vendor    payment_type   price
03-10-15 10:00:00    03-10-15     A1            1          50
03-10-15 10:00:00    03-10-15     A1            2          60
03-10-15 10:00:00    03-11-15     A1            1          45
03-10-15 10:00:00    03-11-15     A1            2          70
03-10-15 10:00:00    03-12-15     B1            1          40
03-10-15 10:00:00    03-12-15     B1            2          45
03-10-15 10:00:00    03-10-15     C1            1          60
03-10-15 10:00:00    03-10-15     C1            1          65

My goal is to have a column for every vendor's price and for each payment type and one row per day. When there are multiple values per day, I want to use the maximum value. The end result should look something like this.

Date       A1_Pay1   A2_Pay2 ... C1_Pay1   C1_Pay2
03-10-15     50        60    ...   65        NaN
03-11-15     45        70    ...   NaN       NaN
03-12-15     NaN       NaN   ...   NaN       NaN

I tried using unstack and pivot, but I either wasn't getting what I was going for, or was getting an error about Date not being a unique index.

Any ideas?

Answer

You can use pivot_table:

#convert column payment_type to string
df['payment_type'] = df['payment_type'].astype(str)df = pd.pivot_table(df, index='date', columns=['vendor', 'payment_type'], aggfunc=max)#remove top level of multiindex
df.columns = df.columns.droplevel(0)#reset multicolumns
df.columns = ['_Pay'.join(col).strip() for col in df.columns.values]print dfA1_Pay1  A1_Pay2  B1_Pay1  B1_Pay2  C1_Pay1
date                                                   
2015-03-10       50       60      NaN      NaN       65
2015-03-11       45       70      NaN      NaN      NaN
2015-03-12      NaN      NaN       40       45      NaN

EDIT:

If you need other statistics, you can add them as list to aggfunc:

#convert column payment_type to string
df['payment_type'] = df['payment_type'].astype(str)
df = pd.pivot_table(df, index='date', columns=['vendor', 'payment_type'], aggfunc=[np.mean, np.max, np.median])
print dfmean                    amax                 median              \price                   price                  price               
vendor          A1      B1        C1    A1      B1      C1     A1      B1       
payment_type     1   2   1   2     1     1   2   1   2   1      1   2   1   2   
date                                                                            
2015-03-10      50  60 NaN NaN  62.5    50  60 NaN NaN  65     50  60 NaN NaN   
2015-03-11      45  70 NaN NaN   NaN    45  70 NaN NaN NaN     45  70 NaN NaN   
2015-03-12     NaN NaN  40  45   NaN   NaN NaN  40  45 NaN    NaN NaN  40  45   vendor          C1  
payment_type     1  
date                
2015-03-10    62.5  
2015-03-11     NaN  
2015-03-12     NaN 
#remove top level of multiindex
df.columns = df.columns.droplevel(1)
#reset multicolumns
df.columns = ['_Pay'.join(col).strip() for col in df.columns.values]
print dfmean_PayA1_Pay1  mean_PayA1_Pay2  mean_PayB1_Pay1  \
date                                                            
2015-03-10               50               60              NaN   
2015-03-11               45               70              NaN   
2015-03-12              NaN              NaN               40   mean_PayB1_Pay2  mean_PayC1_Pay1  amax_PayA1_Pay1  \
date                                                            
2015-03-10              NaN             62.5               50   
2015-03-11              NaN              NaN               45   
2015-03-12               45              NaN              NaN   amax_PayA1_Pay2  amax_PayB1_Pay1  amax_PayB1_Pay2  \
date                                                            
2015-03-10               60              NaN              NaN   
2015-03-11               70              NaN              NaN   
2015-03-12              NaN               40               45   amax_PayC1_Pay1  median_PayA1_Pay1  median_PayA1_Pay2  \
date                                                                
2015-03-10               65                 50                 60   
2015-03-11              NaN                 45                 70   
2015-03-12              NaN                NaN                NaN   median_PayB1_Pay1  median_PayB1_Pay2  median_PayC1_Pay1  
date                                                                 
2015-03-10                NaN                NaN               62.5  
2015-03-11                NaN                NaN                NaN  
2015-03-12                 40                 45                NaN 
https://en.xdnf.cn/q/71515.html

Related Q&A

How to wrap text in Django admin(set column width)

I have a model Itemclass Item(models.Model):id = models.IntegerField(primary_key=True)title = models.CharField(max_length=140, blank=True)description = models.TextField(blank=True)price = models.Decima…

Problems compiling mod_wsgi in virtualenv

Im trying to compile mod_wsgi (version 3.3), Python 2.6, on a CentOS server - but under virtualenv, with no success. Im getting the error:/usr/bin/ld:/home/python26/lib/libpython2.6.a(node.o):relocatio…

Python - Multiprocessing Error cannot start a process twice

I try to develop an algorithm using multiprocessing package in Python, i learn some tutorial from internet and try to develop an algorithm with this package. After looking around and try my hello world…

Printing unicode number of chars in a string (Python)

This should be simple, but I cant crack it. I have a string of Arabic symbols between u\u0600 - u\u06FF and u\uFB50 - u\uFEFF. For example غينيا واستمر العصبة ضرب قد. How do I pri…

Pandas report top-n in group and pivot

I am trying to summarise a dataframe by grouping along a single dimension d1 and reporting summary statistics for each element of d1. In particular I am interested in the top n (index and values) for …

virtualenv --no-site-packages is not working for me

virtualenv --no-site-packages v1cd v1\Scriptsactivate.batpython -c "import django" # - no problem hereWhy does it see the Django package??? It should give me an import error, right?

pandas: Group by splitting string value in all rows (a column) and aggregation function

If i have dataset like this:id person_name salary 0 [alexander, william, smith] 45000 1 [smith, robert, gates] 65000 2 [bob, alexander] …

Seaborn Title Position

The position of my graph title is terrible on this jointplot. Ive tried moving the loc = left, right, and center but it doesnt move from the position its in. Ive also tried something like ax.title.set_…

Expand/collapse ttk Treeview branch

I would like to know the command for collapsing and expanding a branch in ttk.Treeview.Here is a minimalistic example code:#! coding=utf-8 import tkinter as tk from tkinter import ttkroot = tk.Tk() tre…

Uploading images to s3 with meta = image/jpeg - python/boto3

How do I go about setting ContentType on images that I upload to AWS S3 using boto3 to content-type:image/jpeg?Currently, I upload images to S3 using buto3/python 2.7 using the following command:s3.up…