Synopsis

Question 1

Synopsis

Given a directory containing CSV files named with the pattern Prefix-Year.csv, create a new set of CSV files named Prefix-aggregate.csv where each aggregate file is the combination of all CSV files with the same Prefix.

Explanation

I have a directory containing 5,500 CSV files named in this pattern: Prefix-Year.csv. Example:

18394-1999.csv. . .       //consecutive years
18394-2014.csv
18395-1999.csv //next location

I want to group and combine files with common Prefixes into files named Prefix-aggregate.csv.

Question 2

The solution to your question is the find_filesets() method below. I've included a CSV merge method as well based on MaxNoe's answer.

#!/usr/bin/env pythonimport glob
import random
import os
import pandasdef rm_minus_rf(dirname):for r,d,f in os.walk(dirname):for files in f:os.remove(os.path.join(r, files))os.removedirs(r)def create_testfiles(path):rm_minus_rf(path)os.mkdir(path)random.seed()for i in range(10):n = random.randint(10000,99999)for j in range(random.randint(0,20)):# year may repeat, doesn't matteryear = 2015 - random.randint(0,20)with open("{}/{}-{}.csv".format(path, n, year), "w"):passdef find_filesets(path="."):csv_files = {}for name in glob.glob("{}/*-*.csv".format(path)):# there's almost certainly a better way to do thiskey = os.path.splitext(os.path.basename(name))[0].split('-')[0]csv_files.setdefault(key, []).append(name)for key,filelist in csv_files.items(): print key, filelist# do something with filelistcreate_merged_csv(key, filelist)def create_merged_csv(key, filelist):with open('{}-aggregate.csv'.format(key), 'w+b') as outfile:for filename in filelist:df = pandas.read_csv(filename, header=False)df.to_csv(outfile, index=False, header=False)TEST_DIR_NAME="testfiles"
create_testfiles(TEST_DIR_NAME)
find_filesets(TEST_DIR_NAME)

Merging CSVs with similar name python [closed]

Synopsis

Explanation

Related Q&A

urllib.error.HTTPError: HTTP Error 403: Forbidden

Compare multiple file name with the prefix of name in same directory

how to make a unique data from strings

How to read complex data from TB size binary file, fast and keep the most accuracy?

How to get spans text without inner attributes text with selenium?

List of 2D arrays with different size into 3D array [duplicate]

How can I read data from database and show it in a PyQt table

Python: Cubic Spline Regression for a time series data

python CSV , find max and print the information

Error on python3 on windows subsystem for linux for fenics program