Python - Display rows with repeated values in csv files

2024/9/28 21:24:18

I have a .csv file with several columns, one of them filled with random numbers and I want to find duplicated values there. In case there are - strange case, but it's what I want to check after all -, I would like to display/store the complete row in which those values are stored.

To make it clear, I have sth like this:

First, Whatever, 230, Whichever, etc
Second, Whatever, 11, Whichever, etc
Third, Whatever, 46, Whichever, etc
Fourth, Whatever, 18, Whichever, etc
Fifth, Whatever, 14, Whichever, etc
Sixth, Whatever, 48, Whichever, etc
Seventh, Whatever, 91, Whichever, etc
Eighth, Whatever, 18, Whichever, etc
Ninth, Whatever, 67, Whichever, etc

And I would like to have:

Fourth, Whatever, 18, Whichever, etc
Eighth, Whatever, 18, Whichever, etc

To find duplicated values, I store that column into a dictionary and I count every key in order to discover how many times they appear.

import csv
from collections import Counter, defaultdict, OrderedDictwith open(file, 'rt') as inputfile:data = csv.reader(inputfile)seen = defaultdict(set)counts = Counter(row[col_2] for row in data)print "Numbers and times they appear: %s" % counts

And I see

Counter({' 18 ': 2, ' 46 ': 1, ' 67 ': 1, ' 48 ': 1,...})

The problem comes now because I don't manage to link the key with the repetitions and compute it later. If I do

for value in counts:if counts > 1:print counts

I would be taking only the key, which is not what I want and every value (not to mention that I'm looking to print not only that but the whole line...)

Basically I'm looking for a way of doing

If there's a repeated number:print rows containing those number
elseprint "No repetitions"

Thanks in advance.

Answer

try this may work for you.

entries = []
duplicate_entries = []
with open('in.txt', 'r') as my_file:for line in my_file:columns = line.strip().split(',')if columns[2] not in entries:entries.append(columns[2])else:duplicate_entries.append(columns[2]) if len(duplicate_entries) > 0:with open('out.txt', 'w') as out_file:with open('in.txt', 'r') as my_file:for line in my_file:columns = line.strip().split(',')if columns[2] in duplicate_entries:print line.strip()out_file.write(line)
else:print "No repetitions"
https://en.xdnf.cn/q/71294.html

Related Q&A

Defining __getattr__ and __getitem__ on a function has no effect

Disclaimer This is just an exercise in meta-programming, it has no practical purpose.Ive assigned __getitem__ and __getattr__ methods on a function object, but there is no effect...def foo():print &quo…

thread._local object has no attribute

I was trying to change the logging format by adding a context filter. My Format is like thisFORMAT = "%(asctime)s %(VAL)s %(message)s"This is the class I use to set the VAL in the format. cla…

Pytorch batch matrix vector outer product

I am trying to generate a vector-matrix outer product (tensor) using PyTorch. Assuming the vector v has size p and the matrix M has size qXr, the result of the product should be pXqXr.Example:#size: 2 …

Scraping Google Analytics by Scrapy

I have been trying to use Scrapy to get some data from Google Analytics and despite the fact that Im a complete Python newbie I have made some progress. I can now login to Google Analytics by Scrapy b…

Pandas representative sampling across multiple columns

I have a dataframe which represents a population, with each column denoting a different quality/ characteristic of that person. How can I get a sample of that dataframe/ population, which is representa…

TensorFlow - Ignore infinite values when calculating the mean of a tensor

This is probably a basic question, but I cant find a solution:I need to calculate the mean of a tensor ignoring any non-finite values.For example mean([2.0, 3.0, inf, 5.0]) should return 3.333 and not …

encode unicode characters to unicode escape sequences

Ive a CSV file containing sites along with addresses. I need to work on this file to produce a json file that I will use in Django to load initial data to my database. To do that, I need to convert all…

Python: Regarding variable scope. Why dont I need to pass x to Y?

Consider the following code, why dont I need to pass x to Y?class X: def __init__(self):self.a = 1self.b = 2self.c = 3class Y: def A(self):print(x.a,x.b,x.c)x = X() y = Y() y.A()Thank you to…

Python/Pandas - partitioning a pandas DataFrame in 10 disjoint, equally-sized subsets

I want to partition a pandas DataFrame into ten disjoint, equally-sized, randomly composed subsets. I know I can randomly sample one tenth of the original pandas DataFrame using:partition_1 = pandas.Da…

How to fix pylint error Unnecessary use of a comprehension

With python 3.8.6 and pylint 2.4.4 the following code produces a pylint error (or recommendation) R1721: Unnecessary use of a comprehension (unnecessary-comprehension)This is the code: dict1 = {"A…