Pivoting a One-Hot-Encode Dataframe

2024/11/18 10:42:30

I have a pandas dataframe that looks like this:

genres.head()
   Drama   Comedy  Action  Crime   Romance Thriller    Adventure   Horror  Mystery Fantasy ... History Music   War Documentary Sport   Musical Western Film-Noir   News    number_of_genres
tconst                                                                                  
tt0111161   1   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   1
tt0468569   1   0   1   1   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   3
tt1375666   0   0   1   0   0   0   1   0   0   0   ... 0   0   0   0   0   0   0   0   0   3
tt0137523   1   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   1
tt0110912   1   0   0   1   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   2

I want to be able to get a table where the rows are the genres, the columns are the number of labels for a given movie and the values are the counts. In other words, I want this:

number_of_genres    1   2   3   totals
Drama   451 1481    3574    5506
Comedy  333 1108    2248    3689
Action  9   230 1971    2210
Crime   1   284 1687    1972
Romance 1   646 1156    1803
Thriller    22  449 1153    1624
Adventure   1   98  1454    1553
Horror  137 324 765 1226
Mystery 0   108 792 900
Fantasy 1   74  642 717
Sci-Fi  0   129 551 680
Biography   0   95  532 627
Family  0   60  452 512
Animation   0   6   431 437
History 0   32  314 346
Music   1   87  223 311
War 0   90  162 252
Documentary 70  82  78  230
Sport   0   78  142 220
Musical 0   13  131 144
Western 19  44  57  120
Film-Noir   0   11  50  61
News    0   1   2   3
Total   1046    5530    18567   25143 

What is the best way of getting that table pythonistically? I solved the problem through the following code but was wondering if there's a better way:

genres['number_of_genres'] = genres.sum(axis=1)
pivots = []
for column in genres.columns[0:-1]:column = pd.DataFrame(genres[column])columns = column.join(genres.number_of_genres)pivot = pd.pivot_table(columns, values=columns.columns[0], columns='number_of_genres', aggfunc=np.sum)pivots.append(pivot)pivots_df = pd.concat(pivots)
pivots_df['totals'] = pivots_df.sum(axis=1)
pivots_df.loc['Total'] = pivots_df.sum()

[EDIT]: Added jupyter output that should be compatible with pd.read_clipboard(). If I can format the output better, please let me know how I can do so.

Answer

Maybe I'm missing something but doesn't this work for you?

agg = df.groupby('number_of_genres').agg('sum').T
agg['totals'] = agg.sum(axis=1)

Edit: Solution via pivot_table

agg = df.pivot_table(columns='number_of_genres', aggfunc='sum')
agg['total'] = agg.sum(axis=1)
https://en.xdnf.cn/q/120083.html

Related Q&A

How to declare multiple similar variables in python? [duplicate]

This question already has answers here:How do I create variable variables?(18 answers)Closed 5 years ago.How can I declare multiple (about 50) variables that count from slider1 to slider50 ? Is there…

what does means this error broken pipe? [duplicate]

This question already has answers here:Closed 11 years ago.Possible Duplicate:TCP client-server SIGPIPE I would like know what does this error mean?

Apply a function to each element of a pandas series

I am trying to tokenize each sentence of my pandas series. I try to do as I see in the documentation, using apply, but didnt work:x.apply(nltk.word_tokenize)If I just use nltk.word_tokenize(x) didnt wo…

ValueError: could not convert string to float: in Python 3.10

When someone writes a string or a letter, I want the code make them go back, and the code to print "must be a number and bigger than 0 and less than 100", but what actually happens is the cod…

How do I access Class fields in Python Graph-Tool property maps?

Im trying to draw a graph with a class as a vertex property. How do I draw the graph with the vertex_text set to the name field of the classes they contain?from graph_tool.all import *class Node(objec…

How to iterate through each line of a text file and get the sentiment of those lines using python?

Currently, Im working on Sentiment Analysis part. For this I have preferred to use Standford Core NLP library using python. Im able to get the sentiment for each sentence using the following code : fro…

RECURSIVE function that will sum digits of input

Trying to write a piece of code that will sum the digits of a number. Also I should add that I want the program to keep summing the digits until the sum is only 1 digit. For example, if you start with …

Make sure matrix row took from text file are same length(python3) [duplicate]

This question already has answers here:Making sure length of matrix row is all the same (python3)(3 answers)Closed 10 years ago.so I have this code to input a matrix from a text file:import ospath = in…

how to randomize order of questions in a quiz in python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 9 years ago.Improve…

How transform days to hours, minutes and seconds in Python

I have value 1 day, 14:44:00 which I would like transform into this: 38:44:00. Ive tried the following code: myTime = ((myTime.days*24+myTime.hours), myTime.minutes, myTime.seconds) But it doesnt work.…