script to get the max from column based on other column values

2024/10/10 21:23:35

I need a script to read in a csv file(orig.csv) and output a reformatted csv file(format.csv)

The orig csv file will look like this:

Time,Label,frame,slot,SSN,Board,BT,SRN,LabelFrame,SRNAME,LabelID,Integrity,MAX_val
2014-03-17,lableA,1,8,0,,SPUB,1,NNN,NNN,1,100%,60
2014-03-17,lableA,2,22,0,,GOUC,2,NNN02,NNN02,1,100%,
2014-03-17,lableB,2,8,0,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableB,1,2,4,,CCCB,1,NNN,NNN,1,100%,48
2014-03-17,lableB,1,0,6,,CCCB,1,NNN,NNN,1,100%,59
2014-03-17,lableC,2,6,0,,SCUA,2,NNN02,NNN02,1,100%,55
2014-03-17,lableD,2,4,1,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableD,0,2,7,,CCCB,0,MPS,MPS,1,100%,46
2014-03-17,lableD,1,4,3,,CCCA,1,NNN,NNN,1,100%,43
2014-03-17,lableE,2,2,7,,CCCB,2,NNN02,NNN02,1,100%,58

The reformatting will go through the orig csv file and get all the unique names from column2(Label) and the corresponding max of the values from column 13(MAX_val), see the example below.(E.G. lableA-E is of interest and for lableB the max of [59,48,59] is of interest) I also want to it to cater for a dynamic orig.csv file where possible.

the reported csv file will look like this:

Time,Label,frame,slot,SSN,Board,BT,SRN,LabelFrame,SRNAME,LabelID,Integrity,MAX_val
2014-03-17,lableA,1,8,0,,CCCB,1,NNN,NNN,1,100%,60
2014-03-17,lableB,2,8,0,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableC,2,6,0,,SCUA,2,NNN02,NNN02,1,100%,55
2014-03-17,lableD,2,4,1,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableE,2,2,7,,CCCB,2,NNN02,NNN02,1,100%,58

Note : i am new to scripting so not sure what is the best language to write this in but was thinking along the lines of bash, shell, perl but open to others.

EDIT:: this is how I would look to pull in my csv data

<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8"><title>D3: Loading data from a CSV file</title><script type="text/javascript" src="d3/d3.v3.js"></script></head><body><script type="text/javascript">d3.csv("XPU max load_format1(XPU load).csv", function(data) {console.log(data);});</script></body>
</html>
Answer

This is a Perl solution to your problem. It keeps a hash %data of the data for each label that has the highest value in for MAX_val. It also keeps a list of labels in @labels that keeps track of new labels as they are encountered, so as to keep the output in the same order as the input.

As I said in my comment, there is a line in your data that has an empty column 13. I have added code to treat this as zero, which is unnecessary if that is an error in your post.

use strict;
use warnings;open my $orig_fh,   '<', 'orig.csv'   or die $!;
open my $format_fh, '>', 'format.csv' or die $!;print $format_fh scalar <$orig_fh>; # Copy header linemy %data;
my @labels;while (<$orig_fh>) {chomp;my @fields = split /,/, $_, -1;my ($label, $max_val) = @fields[1,12];if ( exists $data{$label} ) {my $prev_max_val = $data{$label}[12] || 0;$data{$label} = \@fields if $max_val and $max_val > $prev_max_val;}else {$data{$label} = \@fields;push @labels, $label;}
}for my $label (@labels) {print $format_fh join(',', @{ $data{$label} }), "\n";
}

output

Time,Label,frame,slot,SSN,Board,BT,SRN,LabelFrame,SRNAME,LabelID,Integrity,MAX_val
2014-03-17,lableA,1,8,0,,SPUB,1,NNN,NNN,1,100%,60
2014-03-17,lableB,2,8,0,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableC,2,6,0,,SCUA,2,NNN02,NNN02,1,100%,55
2014-03-17,lableD,2,4,1,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableE,2,2,7,,CCCB,2,NNN02,NNN02,1,100%,58
https://en.xdnf.cn/q/118407.html

Related Q&A

Selenium code is not able to scrape ofashion.com.cn

I was building a web scraper by using python selenium. The script scraped sites like amazon, stack overflow and flipcart but wasnt able to scrape ofashion. It is always returning me a blank .csv file.H…

How can I access each estimater in scikit-learn pipelines? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

How do I structure a repo with Cloud Run needing higher level code?

I have added code to a repo to build a Cloud Run service. The structure is like this:I want to run b.py in cr. Is there any way I can deploy cr without just copying b.py into the cr directory? (I dont…

Unable to build kivy image loaded .py file into exe using auto-py-to-exe

I have a simple kivy file in which i want to cover the entire canvas with an image bgi.jpg MainWidget: <MainWidget>:canvas.before:Rectangle:size:self.sizesource:bgi.jpgand the .py file code i…

Pandas Panel is deprecated,

This code snippet is from one of my script which works fine in current panda version (0.23) but Panel is deprecated and will be removed in a future version.panel = pd.Panel(dict(df1=dataframe1,df2=data…

Python - Why is this data being written to file incorrectly?

Only the first result is being written to a csv, with one letter of the url per row. This is instead of all urls being written, one per row.What am I not doing right in the last section of this code t…

How does Python interpreter look for types? [duplicate]

This question already has answers here:How does Python interpreter work in dynamic typing?(3 answers)Closed 10 months ago.If I write something like:>>> a = float()how does Python interpreter …

title() method in python writing functions when word like arent

using functiondef make_cap(sentence):return sentence.title()tryining outmake_cap("hello world") Hello World# it workd but when I have world like "arent" and isnt". how to write…

Creating a C++ Qt Gui for a Python logic

I was presented with a Python logic for which I need to create a GUI. I want to use Qt for that purpose and ideally I would like to program it in C++, without using the Qt Creator.What are recommended …

Pythons BaseHTTPServer returns junky responses

I use Pythons BaseHTTPServer and implement the following very simple BaseHTTPRequestHandler:class WorkerHandler(BaseHTTPRequestHandler):def do_GET(self):self.wfile.write({"status" : "rea…