Transform a 3-column dataframe into a matrix

2024/10/6 22:27:20

I have a dataframe df, for example:

A = [["John", "Sunday", 6], ["John", "Monday", 3], ["John", "Tuesday", 2], ["Mary", "Sunday", 6], ["Mary", "Monday", 4], ["Mary", "Tuesday", 7]] 
df = pandas.DataFrame(A, columns=["names", "dates", "times"])

And I want to reshape it so that, instead of three columns, I can create a matrix where the first column indexes the rows, the second column indexes the columns, and the third column becomes the matrix value, something like:

B = [["John", 6, 3, 2], ["Mary", 6, 4, 7]]
df2 = pandas.DataFrame(B, columns=["names", "Sunday", "Monday", "Tuesday"])

or even better:

B = numpy.asarray(B)
B = pandas.DataFrame(B)

How do I transform A into B?

I have created a double for loop, but in my case df is very large and it takes a very long time. Is there a better way to do it?

This is not just a reshape, since A has 18 values and B has 8

Answer

You can use pivot_table(), e.g.:

In []:
df.pivot_table(columns='dates', index='names', values='times').reset_index()Out[]:
dates names  Monday  Sunday  Tuesday
0      John       3       6        2
1      Mary       4       6        7
https://en.xdnf.cn/q/70312.html

Related Q&A

python multiline regex

Im having an issue compiling the correct regular expression for a multiline match. Can someone point out what Im doing wrong. Im looping through a basic dhcpd.conf file with hundreds of entries such as…

OpenCV Python Bindings for GrabCut Algorithm

Ive been trying to use the OpenCV implementation of the grab cut method via the Python bindings. I have tried using the version in both cv and cv2 but I am having trouble finding out the correct param…

showing an image with Graphics View widget

Im new to qt designer and python. I want to created a simple project that I should display an image. I used "Graphics View" widget and I named it "graphicsView". I wrote these funct…

TemplateSyntaxError: settings_tags is not a valid tag library

i got this error when i try to run this test case: WHICH IS written in tests.py of my django application:def test_accounts_register( self ):self.url = http://royalflag.com.pk/accounts/register/self.c =…

Setting NLTK with Stanford NLP (both StanfordNERTagger and StanfordPOSTagger) for Spanish

The NLTK documentation is rather poor in this integration. The steps I followed were:Download http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip to /home/me/stanford Download http:…

python variable scope in nested functions

I am reading this article about decorator.At Step 8 , there is a function defined as:def outer():x = 1def inner():print x # 1return innerand if we run it by:>>> foo = outer() >>> foo.…

How can I throttle Python threads?

I have a thread doing a lot of CPU-intensive processing, which seems to be blocking out other threads. How do I limit it?This is for web2py specifically, but a general solution would be fine.

get lastweek dates using python?

I am trying to get the date of the last week with python. if date is : 10 OCT 2014 meansIt should be print10 OCT 2014, 09 OCT 2014, 08 OCT 2014, 07 OCT 2014, 06 OCT 2014, 05 OCT 2014, 04 OCT 2014I trie…

Why is vectorized numpy code slower than for loops?

I have two numpy arrays, X and Y, with shapes (n,d) and (m,d), respectively. Assume that we want to compute the Euclidean distances between each row of X and each row of Y and store the result in array…

Handle TCP Provider: Error code 0x68 (104)

Im using this code to sync my db with the clients:import pyodbcSYNC_FETCH_ARRAY_SIZE=25000# define connection + cursorconnection = pyodbc.connect()cursor = connection.cursor()query = select some_column…