How can I merge CSV rows that have the same value in the first cell?

2024/9/20 6:47:42

This is the file: https://drive.google.com/file/d/0B5v-nJeoVouHc25wTGdqaDV1WW8/view?usp=sharing

As you can see, there are duplicates in the first column, but if I were to combine the duplicate rows, no data would get overridden in the other columns. Is there any way I can combine the rows with duplicate values in the first column?

For example, turn "1,A,A,," and "1,,,T,T" into "1,A,A,T,T".

Answer

Plain Python:

import csvreader = csv.Reader(open('combined.csv'))
result = {}for row in reader:idx = row[0]values = row[1:]if idx in result:result[idx] = [result[idx][i] or v for i, v in enumerate(values)]else:result[idx] = values

How this magic works:

  • iterate over rows in the CSV file
  • for every record, we check if there was a record with the same index before
  • if this is the first time we see this index, just copy the row values
  • if this is a duplicate, assign row values to empty cells.

The last step is done via or trick: None or value will return value. value or anything will return value. So, result[idx][i] or v will return existing value if it is not empty, or row value.

To output this without loosing the duplicated rows, we need to keep index, then iterate and output corresponding result entries:

indices = []
for row in reader:# ...indices.append(idx)writer = csv.writer(open('outfile.csv', 'w'))
for idx in indices:writer.writerow([idx] + result[idx])
https://en.xdnf.cn/q/119396.html

Related Q&A

i usually get this error : ValueError: invalid literal for int() with base 10

I have loaded a csv file and as i try to print it i get this error Traceback (most recent call last):File "C:\Users\FSTC\Downloads\spaceproject\main.py", line 389, in <module>world_data…

How to Draw a triangle shape in python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 1…

DataFrame from list of string dicts with array() values

So I have a list where each entry looks something like this: "{A: array([1]), B: array([2]), C: array([3])}"I am trying to get a dataframe that looks like thisA B C 0 1 2 3 1 4 …

Need Help Making Buttons to perform for loops when you input a number

I am trying to make a button in Maya using Python that when you type in a number the for loop would loop for that many times. For example, I would put 5 in the box so the for loop would loop 5 times re…

Combining multiple conditional expressions in a list comprehension

I utf-8 encode characters like \u2013 before inserting them into SQLite.When I pull them out with a SELECT, they are back in their unencoded form, so I need to re-encode them if I want to do anything w…

Arduino Live Serial Plotting with a MatplotlibAnimation gets slow

I am making a live plotter to show the analog changes from an Arduino Sensor. The Arduino prints a value to the serial with a Baudrate of 9600. The Python code looks as following: import matplotlib.pyp…

Hide lines on tree view - openerp 7

I want to hide all lines (not only there cointaner) in sequence tree view (the default view). I must hide all lines if code != foo but the attrs atribute dont work on tree views, so how can i filter/hi…

Python Append dataframe generated in nested loops

My program has two for loops. I generate a df in each looping. I want to append this result. For each iteration of inner loop, 1 row and 24 columns data is generated. For each iteration of outer loop, …

bError could not find or load main class caused by java.lang.classnotfoundation error

I am trying to read the executable jar file using python. That jar file doesnt have any java files. It contains only class and JSON files. So what I tried is from subprocess import Popen,PIPEjar_locati…

Invalid value after matching string using regex [duplicate]

This question already has answers here:Incrementing a number at the end of string(2 answers)Closed 3 years ago.I am trying to match strings with an addition of 1 at the end of it and my code gives me t…