delete rows by date and add file name column for multiple csv

2024/7/7 6:19:49

I have multiple "," delimited csv files with recorded water pipe pressure sensor data, already sorted by date older-newer. For all original files, the first column always contains dates formated as YYYYMMDD. I have looked at similar discussion threads but couldn't find what I need.

  1. Python script to add a new column to every csv file in the directory, where each row of the new column titled as "Pipe" would have a file name, omitting file extension string.

  2. Have the option of specifying a cut off date as YYYYMMDD in order to delete rows in the orginal input file. For example, if some file has dates 20140101 to 20140630, I would like cut out rows of data if their date is < 20140401.

  3. Have the option of either to overwrite the original files after having made these modifications or save each file to a different directory, with file names same as the originals.

Input: PipeRed.csv; Headers: Date,Pressure1,Pressure2,Temperature1,Temperature2 etc,

Output: PipeRed.csv; Headers: Pipe,Date,Pressure1,Pressure2,Temperature1, Temperature2,etc,

I have found some code and modified it a little, but it doesn't delete rows like was described above and adds the file name column last rather than 1st.

import csv
import sys
import glob
import refor filename in glob.glob(sys.argv[1]):
#def process_file(filename):# Read the contents of the file into a list of lines.f = open(filename, 'r')contents = f.readlines()f.close()# Use a CSV reader to parse the contents.reader = csv.reader(contents)# Open the output and create a CSV writer for it.f = open(filename, 'wb')writer = csv.writer(f)# Process the header.writer = csv.writer(f)writer.writerow( ('Date','Pressure1','Pressure2','Pressure3','Pressure4','Pipe') )header = reader.next()header.append(filename.replace('.csv',""))writer.writerow(header)# Process each row of the body.for row in reader:row.append(filename.replace('.csv',""))writer.writerow(row)# Close the file and we're done.f.close()
Answer

This function should be very close to what you want. I've tested it in both Python 2.7.9 and 3.4.2. The initial version I posted had some problems because — as I mention then — it was untested. I'm not sure if you're using Python 2 or 3, but this worked properly in either one.

Another change from the previous version is that the optional keyword date argument's name had been changed from cutoff_date to start_date to better reflect what it is. A cutoff date usually means the last date on which it is possible to do something—the opposite of the way you used it in your question. Also note that any date provided should a string, i.e. start_date='20140401', not as an integer.

One enhancement is that it will now create the output directory if one is specified but doesn't already exist.

import csv
import os
import sysdef open_csv(filename, mode='r'):""" Open a csv file in proper mode depending on Python verion. """return (open(filename, mode=mode+'b') if sys.version_info[0] == 2 elseopen(filename, mode=mode, newline=''))def process_file(filename, start_date=None, new_dir=None):# Read the entire contents of the file into memory skipping rows before# any start_date given (assuming row[0] is a date column).with open_csv(filename, 'r') as f:reader = csv.reader(f)header = next(reader)  # Save first row.contents = [row for row in reader if start_date and row[0] >= start_dateor not start_date]# Create different output file path if new_dir was specified.basename = os.path.basename(filename)  # Remove dir name from filename.output_filename = os.path.join(new_dir, basename) if new_dir else filenameif new_dir and not os.path.isdir(new_dir):  # Create directory if necessary.os.makedirs(new_dir)# Open the output file and create a CSV writer for it.with open_csv(output_filename, 'w') as f:writer = csv.writer(f)# Add name of new column to header.header = ['Pipe'] + header  # Prepend new column name.writer.writerow(header)# Data for new column is the base filename without extension.new_column = [os.path.splitext( os.path.split(basename)[1] )[0]]# Process each row of the body by prepending data for new column to it.writer.writerows((new_column+row for row in contents))
https://en.xdnf.cn/q/120250.html

Related Q&A

X = Y = Lists vs Numbers [duplicate]

This question already has answers here:Immutable vs Mutable types(20 answers)How do I clone a list so that it doesnt change unexpectedly after assignment?(24 answers)Closed 4 years ago.In python : I h…

Python data text file grades program

Looking for help with my program. There is a text file with 5 first and last names and a number grade corresponding to each person. The task is to create a user name and change the number grade to a le…

how to fill NA with mean only for 2 or less consequective values of NA

I am new to python. please help me how I should proceed. The following dataframe contains large blocks of NaNs. # Fill the NAs with mean only for 2 or less consecutive values of NAs. # Refer to the d…

Build a new dictionary from the keys of one dictionary and the values of another dictionary

I have two dictionaries:dict_1 = ({a:1, b:2,c:3}) dict_2 = ({x:4,y:5,z:6})I want to take the keys from dict_1 and values from dict_2 and make a new dict_3dict_3 = ({a:4,b:5,c:6})

Python 2.7 - clean syntax for lvalue modification

It is very common to have struct-like types that are not expected to be modified by distant copyholders.A string is a basic example, but thats an easy case because its excusably immutable -- Python is …

Python - Global name date is not defined [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

Python update user input with tkinter button

Im just starting with python and im having a problem. Ive tried various solutions, but i cant update the field that says 19. When i click on plus, i want it to be 20, then 21,... and when i click - it …

Python Unique DF in loop [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 5…

TypeError: unsupported operand type(s) for +=: NoneType and str

I am new to Python and Im sure that Im doing something wrong - I would like to ask a user for three numbers and print their sum, heres my current code:for i in range(0, 3):total = Nonenum = input(Pleas…

multiply list of ndarrays by list

I want to multiply a list of numbers, by a ndarray, that is, each number in the list multiply by the first ndarray. This is the list: list1 = [840,845,897]This is the list of ndarray list = df[Example]…