This is the file: https://drive.google.com/file/d/0B5v-nJeoVouHc25wTGdqaDV1WW8/view?usp=sharing
As you can see, there are duplicates in the first column, but if I were to combine the duplicate rows, no data would get overridden in the other columns. Is there any way I can combine the rows with duplicate values in the first column?
For example, turn "1,A,A,," and "1,,,T,T" into "1,A,A,T,T".
Plain Python:
import csvreader = csv.Reader(open('combined.csv'))
result = {}for row in reader:idx = row[0]values = row[1:]if idx in result:result[idx] = [result[idx][i] or v for i, v in enumerate(values)]else:result[idx] = values
How this magic works:
- iterate over rows in the CSV file
- for every record, we check if there was a record with the same index before
- if this is the first time we see this index, just copy the row values
- if this is a duplicate, assign row values to empty cells.
The last step is done via or
trick: None or value
will return value
. value or anything
will return value
. So, result[idx][i] or v
will return existing value if it is not empty, or row value.
To output this without loosing the duplicated rows, we need to keep index, then iterate and output corresponding result
entries:
indices = []
for row in reader:# ...indices.append(idx)writer = csv.writer(open('outfile.csv', 'w'))
for idx in indices:writer.writerow([idx] + result[idx])