Given a directory containing CSV files named with the pattern Prefix-Year.csv, create a new set of CSV files named Prefix-aggregate.csv where each aggregate file is the combination of all CSV files with the same Prefix.
Explanation
I have a directory containing 5,500 CSV files named in this pattern: Prefix-Year.csv. Example:
18394-1999.csv. . . //consecutive years
18394-2014.csv
18395-1999.csv //next location
I want to group and combine files with common Prefixes into files named Prefix-aggregate.csv.
Answer
The solution to your question is the find_filesets() method below. I've included a CSV merge method as well based on MaxNoe's answer.
#!/usr/bin/env pythonimport glob
import random
import os
import pandasdef rm_minus_rf(dirname):for r,d,f in os.walk(dirname):for files in f:os.remove(os.path.join(r, files))os.removedirs(r)def create_testfiles(path):rm_minus_rf(path)os.mkdir(path)random.seed()for i in range(10):n = random.randint(10000,99999)for j in range(random.randint(0,20)):# year may repeat, doesn't matteryear = 2015 - random.randint(0,20)with open("{}/{}-{}.csv".format(path, n, year), "w"):passdef find_filesets(path="."):csv_files = {}for name in glob.glob("{}/*-*.csv".format(path)):# there's almost certainly a better way to do thiskey = os.path.splitext(os.path.basename(name))[0].split('-')[0]csv_files.setdefault(key, []).append(name)for key,filelist in csv_files.items(): print key, filelist# do something with filelistcreate_merged_csv(key, filelist)def create_merged_csv(key, filelist):with open('{}-aggregate.csv'.format(key), 'w+b') as outfile:for filename in filelist:df = pandas.read_csv(filename, header=False)df.to_csv(outfile, index=False, header=False)TEST_DIR_NAME="testfiles"
create_testfiles(TEST_DIR_NAME)
find_filesets(TEST_DIR_NAME)
I get the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" when scraping certain pages, and understand that adding something like hdr = {"User-Agent: Mozilla/5.0"} to the h…
I have multiple .png and .json file in same directory . And I want to check where the files available in the directory are of same name or not like a.png & a.json, b.png & b.json
I have a data like this . the strings are separated by comma."India1,India2,myIndia "
"Where,Here,Here "
"Here,Where,India,uyete"
"AFD,TTT"What I am trying…
Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below:
file=open(filename,rb)
bytes=file.read(8)
print(bytes)
b\x14\x00\x80?\xb5\x0c\xf81I tried np.fromfile np.fromfile(np…
This question already has answers here:How do you create a (sometimes) ragged array of arrays in Numpy?(2 answers)Closed last year.I have a program that generating 2D arrays with different number of r…
I am trying to load data from database that I added to the database through this code PyQt integration with Sqlalchemy .I want the data from the database to be displayed into a table.I have tried this …
I have the data as shown below. I want to find a CUBIC SPLINE curve that fits the entire data set (link to sample data). Things Ive tried so far:Ive gone through scipys Cubic Spline Functions, but all …
My aim is to find the max of the individual column and print out the information. But there is problem when I print some of the information. For example CSIT135, nothing was printed out. CSIT121 only p…
Im just starting to use fenics in python3 on windows subsystem ubuntu, and when I open the first titurial file I got this error.
Solving linear variational problem. Traceback (most recent call last):
…