Question 1

I have developed a public and open source App for Splunk (Nmon performance monitor for Unix and Linux Systems, see https://apps.splunk.com/app/1753/)

A master piece of the App is an old perl (recycled, modified and updated) script automatically launched by the App to convert the Nmon data (which is some kind of custom csv), reading it from stdin and writing out to formerly formatted csv files by section (a section is a performance monitor)

I want now to fully rewrite this script in Python, which is almost done for a first beta version... BUT i am facing difficulties to transpose data, and i'm afraid not being able to solve it myself.

This is why i am kindly asking for help today.

Here is the difficulty in details:

Nmon generates performance monitors for various sections (cpu, memory, disks...), for many of them there is no big difficulty but extracting the good timestamp and so on. But for all sections that have "device" notion (such as DISKBUSY in the provided example, which represents the percentage of time disks were busy) have to be transformed and transposed to be later exploitable

Currently, i am able to generate the data as follows:

Example:

time,sda,sda1,sda2,sda3,sda5,sda6,sda7,sdb,sdb1,sdc,sdc1,sdc2,sdc3
26-JUL-2014 11:10:44,4.4,0.0,0.0,0.0,0.4,1.9,2.5,0.0,0.0,10.2,10.2,0.0,0.0
26-JUL-2014 11:10:54,4.8,0.0,0.0,0.0,0.3,2.0,2.6,0.0,0.0,5.4,5.4,0.0,0.0
26-JUL-2014 11:11:04,4.8,0.0,0.0,0.0,0.4,2.3,2.1,0.0,0.0,17.8,17.8,0.0,0.0
26-JUL-2014 11:11:14,2.1,0.0,0.0,0.0,0.2,0.5,1.5,0.0,0.0,28.2,28.2,0.0,0.0

The goal is to transpose the data such as we will have in the header "time,device,value", example:

time,device,value
26-JUL-2014 11:10:44,sda,4.4
26-JUL-2014 11:10:44,sda1,0.0
26-JUL-2014 11:10:44,sda2,0.0

And so on.

One month ago, I've opened a question for almost the same need (for another app and not exactly the same data, but the same need to transpose columns to rows)

Python - CSV time oriented Transposing large number of columns to rows

I had a very great answer which perfectly did the trick, thus i am unable to recycle the piece of code into this new context. One of difference is that i want to include the data transposition inside within the code, such that the script only works in memory and avoid dealing with multiple temporary files.

Here is the piece of code:

Note: needs to use Python 2x

###################
# Dynamic Sections : data requires to be transposed to be exploitable within Splunk
###################dynamic_section = ["DISKBUSY"]for section in dynamic_section:# Set output filecurrsection_output = DATA_DIR + HOSTNAME + '_' + day + '_' + month + '_' + year + '_' + hour + minute + second + '_' + section + '.csv'# Open output for writingwith open(currsection_output, "w") as currsection:for line in data:# Extract sections, and write to outputmyregex = r'^' + section + '[0-9]*' + '|ZZZZ.+'find_section = re.match( myregex, line)if find_section:# csv header# Replace some symbolsline=re.sub("%",'_PCT',line)line=re.sub(" ",'_',line)# Extract header excluding data that always has Txxxx for timestamp referencemyregex = '(' + section + ')\,([^T].+)'fullheader_match = re.search( myregex, line)            if fullheader_match:fullheader = fullheader_match.group(2)header_match = re.match( r'([a-zA-Z\-\/\_0-9]+,)([a-zA-Z\-\/\_0-9\,]*)', fullheader)    if header_match:header = header_match.group(2)# Write headercurrsection.write('time' + ',' + header + '\n'),# Extract timestamp# Nmon V9 and prior do not have date in ZZZZ# If unavailable, we'll use the global date (AAA,date)ZZZZ_DATE = '-1'ZZZZ_TIME = '-1'                # For Nmon V10 and more             timestamp_match = re.match( r'^ZZZZ\,(.+)\,(.+)\,(.+)\n', line)if timestamp_match:ZZZZ_TIME = timestamp_match.group(2)ZZZZ_DATE = timestamp_match.group(3)            ZZZZ_timestamp = ZZZZ_DATE + ' ' + ZZZZ_TIME# For Nmon V9 and less                  if ZZZZ_DATE == '-1':ZZZZ_DATE = DATEtimestamp_match = re.match( r'^ZZZZ\,(.+)\,(.+)\n', line)if timestamp_match:ZZZZ_TIME = timestamp_match.group(2)                    ZZZZ_timestamp = ZZZZ_DATE + ' ' + ZZZZ_TIME# Extract Datamyregex = r'^' + section + '\,(T\d+)\,(.+)\n'perfdata_match = re.match( myregex, line)if perfdata_match:perfdata = perfdata_match.group(2)# Write perf datacurrsection.write(ZZZZ_timestamp + ',' + perfdata + '\n'),# End for# Open output for reading and show number of line we extractedwith open(currsection_output, "r") as currsection:num_lines = sum(1 for line in currsection)print (section + " section: Wrote", num_lines, "lines")# End for

The line:

                currsection.write('time' + ',' + header + '\n'),

will contain the header

And the line:

            currsection.write(ZZZZ_timestamp + ',' + perfdata + '\n'),

contains the data line by line

Note: the final data (header and body data) should in target also contains other information, to simplify things i removed it in the code above

For static sections which does not require the data transposition, the same lines will be:

                    currsection.write('type' + ',' + 'serialnum' + ',' + 'hostname' + ',' + 'time' + ',' + header + '\n'),

And:

                currsection.write(section + ',' + SN + ',' + HOSTNAME + ',' + ZZZZ_timestamp + ',' + perfdata + '\n'),

The great goal would be to be able to transpose the data just after the required definition and before writing it.

Also, performance and minimum system resources called (such as working with temporary files instead of memory) is a requirement to prevent from generating too high cpu load on systems periodically the script.

Could anyone help me to achieve this ? I've looked for it again and again, i'm pretty sure there is multiple ways to achieve this (zip, map, dictionary, list, split...) but i failed to achieve it...

Please be indulgent, this is my first real Python script :-)

Thank you very much for any help !

More details:

testing nmon file

A small testing nmon file can be retrieved here: http://pastebin.com/xHLRbBU0

Current complete script

The current complete script can be retrieved here: http://pastebin.com/QEnXj6Yh

To test the script, it is required to:

export the SPLUNK_HOME variable to anything relevant for you, ex:
mkdir /tmp/nmon2csv

--> place the script and nmon file here, allow execution on script

export SPLUNK_HOME=/tmp/nmon2csv
mkdir -p etc/apps/nmon

And finally:

cat test.nmon | ./nmon2csv.py

Data will be generated in /tmp/nmon2csv/etc/apps/nmon/var/*

Update: Working code using csv module:

###################
# Dynamic Sections : data requires to be transposed to be exploitable within Splunk
###################dynamic_section = ["DISKBUSY","DISKBSIZE","DISKREAD","DISKWRITE","DISKXFER","DISKRIO","DISKWRIO","IOADAPT","NETERROR","NET","NETPACKET","JFSFILE","JFSINODE"]for section in dynamic_section:# Set output file (will opened after transpose)currsection_output = DATA_DIR + HOSTNAME + '_' + day + '_' + month + '_' + year + '_' + hour + minute + second + '_' + section + '.csv'# Open Tempwith TemporaryFile() as tempf:for line in data:# Extract sections, and write to outputmyregex = r'^' + section + '[0-9]*' + '|ZZZZ.+'find_section = re.match( myregex, line)if find_section:# csv header# Replace some symbolsline=re.sub("%",'_PCT',line)line=re.sub(" ",'_',line)# Extract header excluding data that always has Txxxx for timestamp referencemyregex = '(' + section + ')\,([^T].+)'fullheader_match = re.search( myregex, line)            if fullheader_match:fullheader = fullheader_match.group(2)header_match = re.match( r'([a-zA-Z\-\/\_0-9]+,)([a-zA-Z\-\/\_0-9\,]*)', fullheader)    if header_match:header = header_match.group(2)# Write headertempf.write('time' + ',' + header + '\n'),  # Extract timestamp# Nmon V9 and prior do not have date in ZZZZ# If unavailable, we'll use the global date (AAA,date)ZZZZ_DATE = '-1'ZZZZ_TIME = '-1'                # For Nmon V10 and more             timestamp_match = re.match( r'^ZZZZ\,(.+)\,(.+)\,(.+)\n', line)if timestamp_match:ZZZZ_TIME = timestamp_match.group(2)ZZZZ_DATE = timestamp_match.group(3)            ZZZZ_timestamp = ZZZZ_DATE + ' ' + ZZZZ_TIME# For Nmon V9 and less                  if ZZZZ_DATE == '-1':ZZZZ_DATE = DATEtimestamp_match = re.match( r'^ZZZZ\,(.+)\,(.+)\n', line)if timestamp_match:ZZZZ_TIME = timestamp_match.group(2)                    ZZZZ_timestamp = ZZZZ_DATE + ' ' + ZZZZ_TIME# Extract Datamyregex = r'^' + section + '\,(T\d+)\,(.+)\n'perfdata_match = re.match( myregex, line)if perfdata_match:perfdata = perfdata_match.group(2)# Write perf datatempf.write(ZZZZ_timestamp + ',' + perfdata + '\n'),# Open final for writingwith open(currsection_output, "w") as currsection:# Rewind temptempf.seek(0)writer = csv.writer(currsection)writer.writerow(['type', 'serialnum', 'hostname', 'time', 'device', 'value'])           for d in csv.DictReader(tempf):time = d.pop('time')for device, value in sorted(d.items()):row = [section, SN, HOSTNAME, time, device, value]writer.writerow(row)            # End for# Open output for reading and show number of line we extractedwith open(currsection_output, "r") as currsection:num_lines = sum(1 for line in currsection)print (section + " section: Wrote", num_lines, "lines")# End for

Question 2

Thegoal is to transpose the data such as we will have in the header"time,device,value"

This rough transposition logic looks like this:

text = '''time,sda,sda1,sda2,sda3,sda5,sda6,sda7,sdb,sdb1,sdc,sdc1,sdc2,sdc3
26-JUL-2014 11:10:44,4.4,0.0,0.0,0.0,0.4,1.9,2.5,0.0,0.0,10.2,10.2,0.0,0.0
26-JUL-2014 11:10:54,4.8,0.0,0.0,0.0,0.3,2.0,2.6,0.0,0.0,5.4,5.4,0.0,0.0
26-JUL-2014 11:11:04,4.8,0.0,0.0,0.0,0.4,2.3,2.1,0.0,0.0,17.8,17.8,0.0,0.0
26-JUL-2014 11:11:14,2.1,0.0,0.0,0.0,0.2,0.5,1.5,0.0,0.0,28.2,28.2,0.0,0.0
'''import csvfor d in csv.DictReader(text.splitlines()):time = d.pop('time')for device, value in sorted(d.items()):print time, device, value

Putting it all together into a complete script looks something like this:

import csvwith open('transposed.csv', 'wb') as destfile:writer = csv.writer(destfile)writer.writerow(['time', 'device', 'value'])with open('data.csv', 'rb') as sourefile:for d in csv.DictReader(sourcefile):time = d.pop('time')for device, value in sorted(d.items()):row = [time, device, value]writer.writerow(row)

Python - Transpose columns to rows within data operation and before writing to file

Related Q&A

Unexpected output while sorting the list of IP address [duplicate]

Google Cloud Run returning Server Unavailable Occasionally

Connecting to Internet?

Angular App Not Working When Moving to Python Flask

How can I subtract tuples in a list?

Chart barh matplotlib - overlap bars

Python django image upload is not working

Android bluetooth send message working first time only

Scraping data from href

Numpy - how to sort an array of value/key pairs in descending order