I have a pandas dataframe that looks like this:
genres.head()
Drama Comedy Action Crime Romance Thriller Adventure Horror Mystery Fantasy ... History Music War Documentary Sport Musical Western Film-Noir News number_of_genres tconst tt0111161 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 tt0468569 1 0 1 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 3 tt1375666 0 0 1 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 3 tt0137523 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 tt0110912 1 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 2
I want to be able to get a table where the rows are the genres, the columns are the number of labels for a given movie and the values are the counts. In other words, I want this:
number_of_genres 1 2 3 totals Drama 451 1481 3574 5506 Comedy 333 1108 2248 3689 Action 9 230 1971 2210 Crime 1 284 1687 1972 Romance 1 646 1156 1803 Thriller 22 449 1153 1624 Adventure 1 98 1454 1553 Horror 137 324 765 1226 Mystery 0 108 792 900 Fantasy 1 74 642 717 Sci-Fi 0 129 551 680 Biography 0 95 532 627 Family 0 60 452 512 Animation 0 6 431 437 History 0 32 314 346 Music 1 87 223 311 War 0 90 162 252 Documentary 70 82 78 230 Sport 0 78 142 220 Musical 0 13 131 144 Western 19 44 57 120 Film-Noir 0 11 50 61 News 0 1 2 3 Total 1046 5530 18567 25143
What is the best way of getting that table pythonistically? I solved the problem through the following code but was wondering if there's a better way:
genres['number_of_genres'] = genres.sum(axis=1)
pivots = []
for column in genres.columns[0:-1]:column = pd.DataFrame(genres[column])columns = column.join(genres.number_of_genres)pivot = pd.pivot_table(columns, values=columns.columns[0], columns='number_of_genres', aggfunc=np.sum)pivots.append(pivot)pivots_df = pd.concat(pivots)
pivots_df['totals'] = pivots_df.sum(axis=1)
pivots_df.loc['Total'] = pivots_df.sum()
[EDIT]: Added jupyter output that should be compatible with pd.read_clipboard(). If I can format the output better, please let me know how I can do so.