Merging CSVs with similar name python [closed]

2024/11/15 18:01:40

Synopsis

Given a directory containing CSV files named with the pattern Prefix-Year.csv, create a new set of CSV files named Prefix-aggregate.csv where each aggregate file is the combination of all CSV files with the same Prefix.

Explanation

I have a directory containing 5,500 CSV files named in this pattern: Prefix-Year.csv. Example:

18394-1999.csv. . .       //consecutive years
18394-2014.csv
18395-1999.csv //next location

I want to group and combine files with common Prefixes into files named Prefix-aggregate.csv.

Answer

The solution to your question is the find_filesets() method below. I've included a CSV merge method as well based on MaxNoe's answer.

#!/usr/bin/env pythonimport glob
import random
import os
import pandasdef rm_minus_rf(dirname):for r,d,f in os.walk(dirname):for files in f:os.remove(os.path.join(r, files))os.removedirs(r)def create_testfiles(path):rm_minus_rf(path)os.mkdir(path)random.seed()for i in range(10):n = random.randint(10000,99999)for j in range(random.randint(0,20)):# year may repeat, doesn't matteryear = 2015 - random.randint(0,20)with open("{}/{}-{}.csv".format(path, n, year), "w"):passdef find_filesets(path="."):csv_files = {}for name in glob.glob("{}/*-*.csv".format(path)):# there's almost certainly a better way to do thiskey = os.path.splitext(os.path.basename(name))[0].split('-')[0]csv_files.setdefault(key, []).append(name)for key,filelist in csv_files.items(): print key, filelist# do something with filelistcreate_merged_csv(key, filelist)def create_merged_csv(key, filelist):with open('{}-aggregate.csv'.format(key), 'w+b') as outfile:for filename in filelist:df = pandas.read_csv(filename, header=False)df.to_csv(outfile, index=False, header=False)TEST_DIR_NAME="testfiles"
create_testfiles(TEST_DIR_NAME)
find_filesets(TEST_DIR_NAME)
https://en.xdnf.cn/q/119632.html

Related Q&A

urllib.error.HTTPError: HTTP Error 403: Forbidden

I get the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" when scraping certain pages, and understand that adding something like hdr = {"User-Agent: Mozilla/5.0"} to the h…

Compare multiple file name with the prefix of name in same directory

I have multiple .png and .json file in same directory . And I want to check where the files available in the directory are of same name or not like a.png & a.json, b.png & b.json

how to make a unique data from strings

I have a data like this . the strings are separated by comma."India1,India2,myIndia " "Where,Here,Here " "Here,Where,India,uyete" "AFD,TTT"What I am trying…

How to read complex data from TB size binary file, fast and keep the most accuracy?

Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below: file=open(filename,rb) bytes=file.read(8) print(bytes) b\x14\x00\x80?\xb5\x0c\xf81I tried np.fromfile np.fromfile(np…

How to get spans text without inner attributes text with selenium?

<span class="cname"><em class="multiple">2017</em> Ford </span> <span class="cname">Toyota </span>I want to get only "FORD" …

List of 2D arrays with different size into 3D array [duplicate]

This question already has answers here:How do you create a (sometimes) ragged array of arrays in Numpy?(2 answers)Closed last year.I have a program that generating 2D arrays with different number of r…

How can I read data from database and show it in a PyQt table

I am trying to load data from database that I added to the database through this code PyQt integration with Sqlalchemy .I want the data from the database to be displayed into a table.I have tried this …

Python: Cubic Spline Regression for a time series data

I have the data as shown below. I want to find a CUBIC SPLINE curve that fits the entire data set (link to sample data). Things Ive tried so far:Ive gone through scipys Cubic Spline Functions, but all …

python CSV , find max and print the information

My aim is to find the max of the individual column and print out the information. But there is problem when I print some of the information. For example CSIT135, nothing was printed out. CSIT121 only p…

Error on python3 on windows subsystem for linux for fenics program

Im just starting to use fenics in python3 on windows subsystem ubuntu, and when I open the first titurial file I got this error. Solving linear variational problem. Traceback (most recent call last): …