Data manipulation, kind of downsampling

2024/11/19 17:47:37

I have a large csv file, example of the data below. I will use an example of eight teams to illustrate.

home_team    away_team      home_score       away_score         year
belgium      france         2                2                  1990
brazil       uruguay        3                1                  1990
italy        belgium        1                2                  1990
sweden       mexico         3                1                  1990france       chile          3                1                  1991
brazil       england        2                1                  1991
italy        belgium        1                2                  1991
chile        switzerland    2                2                  1991

My data runs for many years. I would like to have total number of scores of each team every year, see example below,

team            total_scores          year
belgium         4                     1990
france          2                     1990
brazil          3                     1990
uruguay         1                     1990
italy           1                     1990
sweden          3                     1990
mexico          1                     1990france          3                     1991
chile           5                     1991
brazil          2                     1991
england         1                     1991
italy           1                     1991
belgium         2                     1991
switzerland     2                     1991

Thoughts?

Answer

Here is a solution using the tidyverse (dplyr and tidyr), in particular the pivot functions from tidyr...

library(tidyverse)df %>% pivot_longer(cols = -year,   #splits non-year columns into home/away and type columnsnames_to = c("homeaway", "type"), names_sep = "_", values_to = "value", values_ptypes = list(value = character())) %>% select(-homeaway) %>%             #remove home/awaypivot_wider(names_from = "type",  #restore team and score columns (as list columns)values_from = "value") %>% unnest(cols = c(team, score)) %>% #unnest the list columns to year, team, scoregroup_by(year, team) %>% summarise(total_goals = sum(as.numeric(score)))# A tibble: 14 x 3
# Groups:   year [2]year team        total_goals<int> <chr>             <dbl>1  1990 belgium               42  1990 brazil                33  1990 france                24  1990 italy                 15  1990 mexico                16  1990 sweden                37  1990 uruguay               18  1991 belgium               29  1991 brazil                2
10  1991 chile                 3
11  1991 england               1
12  1991 france                3
13  1991 italy                 1
14  1991 switzerland           2
https://en.xdnf.cn/q/118518.html

Related Q&A

Chrome Native Messaging throwing error when sending a base64 string to client

Using Chrome Native Messaging sample app as a template am able make a system call to bashos.system("<bash command>")The requirement is to return a base64 string from the python scriptos…

Exporting DataFrame to Excel using pandas without subscribe

How can I export DataFrame to excel without subscribe? For exemple: Im doing webscraping and there is a table with pagination, so I take the page 1 save it in DataFrame, export to excel e do it again …

Fraction of a real number in python giving complicated answer

Importing Fraction from fractions to give a fractional representation of a real number, but giving responses quite complicated which seems very simple by the paper-pen method. Fractions(.2) giving answ…

Scrape latitude and longitude (Google Maps) inside Script type=text/javascript

Im beginner in Web Scrapping. Im trying to get latitude and longitude from this web: https://urbania.pe/inmueble/proyecto/ememhvin-proyecto-mariscal-castilla-lima-santiago-de-surco-tale-inmobiliaria-65…

How to delete a button that is made by a loop

from tkinter import *class Main:def __init__(self, root):for i in range(0, 9):for k in range(0, 9):Button(root, text=" ").grid(row=i, column=k)root.mainloop()root = Tk()x = Main(root)How do I…

Invalid array shape with neural network using Keras?

Currently studying the Deep Learning with Python book by Francios Chollet. I am very new to this and I am getting this error code despite following his code verbatim. Can anyone interpret the error mes…

How to download PDF files from a list of URLs in Python?

I have a big list of links to PDF files that I need to download (500+) and I was trying to make a program to download them all because I dont want to manually do them. This is what I have and when I tr…

Training on GPU much slower than on CPU - why and how to speed it up?

I am training a Convolutional Neural Network using Google Colabs CPU and GPU. This is the architecture of the network: Model: "sequential" ____________________________________________________…

Check list item is present in Dictionary

Im trying to extend Python - Iterate thru month dates and print a custom output and add an addtional functionality to check if a date in the given date range is national holiday, print "NH" a…

a list of identical elements in the merge list

I need to merge the list and have a function that can be implemented, but when the number of merges is very slow and unbearable, I wonder if there is a more efficient way Consolidation conditions:Sub-…