Question 1

here is a snapshot of my csv:

alex    123f    1
harry   fwef    2
alex    sef 3
alex    gsdf    4
alex    wf35    6
harry   sdfsdf  3

i would like to get the subset of this data where the occurrence of anything in the first column (harry, alex) is at least 4. so i want the resulting data set to be:

alex    123f    1
alex    sef 3
alex    gsdf    4
alex    wf35    6

Question 2

Clearly, you cannot decide which rows are interesting until you've seen all rows (since the very last row might be the one turning some count from three to four and thereby making some previously seen rows interesting, for example;-). So, unless your CSV file is horribly huge, suck it all into memory, first, as a list...:

import csvwith open('thefile.csv', 'rb') as f:data = list(csv.reader(f))

then, do the counting -- Python 2.7 has a better way, but assuming you're still on 2.6 like most of us...:

import collections
counter = collections.defaultdict(int)
for row in data:counter[row[0]] += 1

and finally do the selection loop...:

for row in data:if counter[row[0]] >= 4:print row

Of course, this prints each interesting row as a roughly-hewed list (with square brackets and quotes around the items), but it will be easy to format it in any way you might prefer.

python csv: getting subset

Related Q&A

Variable within a Variable in Python (3)

selenium scraping data using children of elements

Python - ETFs Daily Data Web Scraping

How to create DataFrame with columns based on scraped data?

How do i change the colour of a button border tkinter

module object has no attribute Gridspec despite calling help(gridspec) revealing the Gridspec class

Python division doesnt work as expected for large numbers [duplicate]

working out an average of the values in a dictionary

getting an error when trying to import a list into a mysql table

Getting a view does not return a valid response error message on my flask chatbot [duplicate]