Question 1

I have a .csv file with several columns, one of them filled with random numbers and I want to find duplicated values there. In case there are - strange case, but it's what I want to check after all -, I would like to display/store the complete row in which those values are stored.

To make it clear, I have sth like this:

First, Whatever, 230, Whichever, etc
Second, Whatever, 11, Whichever, etc
Third, Whatever, 46, Whichever, etc
Fourth, Whatever, 18, Whichever, etc
Fifth, Whatever, 14, Whichever, etc
Sixth, Whatever, 48, Whichever, etc
Seventh, Whatever, 91, Whichever, etc
Eighth, Whatever, 18, Whichever, etc
Ninth, Whatever, 67, Whichever, etc

And I would like to have:

Fourth, Whatever, 18, Whichever, etc
Eighth, Whatever, 18, Whichever, etc

To find duplicated values, I store that column into a dictionary and I count every key in order to discover how many times they appear.

import csv
from collections import Counter, defaultdict, OrderedDictwith open(file, 'rt') as inputfile:data = csv.reader(inputfile)seen = defaultdict(set)counts = Counter(row[col_2] for row in data)print "Numbers and times they appear: %s" % counts

And I see

Counter({' 18 ': 2, ' 46 ': 1, ' 67 ': 1, ' 48 ': 1,...})

The problem comes now because I don't manage to link the key with the repetitions and compute it later. If I do

for value in counts:if counts > 1:print counts

I would be taking only the key, which is not what I want and every value (not to mention that I'm looking to print not only that but the whole line...)

Basically I'm looking for a way of doing

If there's a repeated number:print rows containing those number
elseprint "No repetitions"

Thanks in advance.

Question 2

try this may work for you.

entries = []
duplicate_entries = []
with open('in.txt', 'r') as my_file:for line in my_file:columns = line.strip().split(',')if columns[2] not in entries:entries.append(columns[2])else:duplicate_entries.append(columns[2]) if len(duplicate_entries) > 0:with open('out.txt', 'w') as out_file:with open('in.txt', 'r') as my_file:for line in my_file:columns = line.strip().split(',')if columns[2] in duplicate_entries:print line.strip()out_file.write(line)
else:print "No repetitions"

Python - Display rows with repeated values in csv files

Related Q&A

Defining getattr and getitem on a function has no effect

thread._local object has no attribute

Pytorch batch matrix vector outer product

Scraping Google Analytics by Scrapy

Pandas representative sampling across multiple columns

TensorFlow - Ignore infinite values when calculating the mean of a tensor

encode unicode characters to unicode escape sequences

Python: Regarding variable scope. Why dont I need to pass x to Y?

Python/Pandas - partitioning a pandas DataFrame in 10 disjoint, equally-sized subsets

How to fix pylint error Unnecessary use of a comprehension