python csv: getting subset

2024/10/7 10:24:30

here is a snapshot of my csv:

alex    123f    1
harry   fwef    2
alex    sef 3
alex    gsdf    4
alex    wf35    6
harry   sdfsdf  3

i would like to get the subset of this data where the occurrence of anything in the first column (harry, alex) is at least 4. so i want the resulting data set to be:

alex    123f    1
alex    sef 3
alex    gsdf    4
alex    wf35    6
Answer

Clearly, you cannot decide which rows are interesting until you've seen all rows (since the very last row might be the one turning some count from three to four and thereby making some previously seen rows interesting, for example;-). So, unless your CSV file is horribly huge, suck it all into memory, first, as a list...:

import csvwith open('thefile.csv', 'rb') as f:data = list(csv.reader(f))

then, do the counting -- Python 2.7 has a better way, but assuming you're still on 2.6 like most of us...:

import collections
counter = collections.defaultdict(int)
for row in data:counter[row[0]] += 1

and finally do the selection loop...:

for row in data:if counter[row[0]] >= 4:print row

Of course, this prints each interesting row as a roughly-hewed list (with square brackets and quotes around the items), but it will be easy to format it in any way you might prefer.

https://en.xdnf.cn/q/118837.html

Related Q&A

Variable within a Variable in Python (3)

My head is probably in the wrong place with this, but I want to put a variable within a variable.My goal for this script is to compare current versions of clients software with current software version…

selenium scraping data using children of elements

Hi im trying to scrape some data from a live stocks website. I want to display the companies name and stock price, %change ect. The details of 25 companies are shown per page, and these details follow …

Python - ETFs Daily Data Web Scraping

Im trying to web scrape some daily info of differents ETFs. I found that https://www.marketwatch.com/ have a accurate info. The most relevant info is the open Price, outstanding shares, NAV, total asse…

How to create DataFrame with columns based on scraped data?

import requests, re from bs4 import BeautifulSoupdata = []soup = BeautifulSoup(requests.get(https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4ARfIAQzYAQHoAQH4AQKI…

How do i change the colour of a button border tkinter

How do i change the colour of a border in tkinterI have looked at other solutions which recommended using highlightcolor and highlightbackground, however these did not work. excercises_button = Button(…

module object has no attribute Gridspec despite calling help(gridspec) revealing the Gridspec class

If I run the python console and doimport matplotlib matplotlib.__version__ import matplotlib.gridspec as gsI see that the matplotlib version is 1.2.1.If I do help(gs) I see the Gridspec class.However t…

Python division doesnt work as expected for large numbers [duplicate]

This question already has answers here:What class to use for money representation?(6 answers)Closed 9 months ago.I have three variables a, b and c. I want to make sure that after doing this: c -= a*bc…

working out an average of the values in a dictionary

My dictionary as of now is like this:class_1 = {Bob:[9,5,4,3,3,4], John:[5,5,7,3,6], Andy:[7,5,6,4,5], Harris:[3,4,2,3,2,3,2]}What i am trying to make it look like is this:class_1_average ={Bob:[averag…

getting an error when trying to import a list into a mysql table

whenever i try to add a list into the mysql table I get an error : ProgrammingError: Not all parameters were used in the SQL statementive tried to look online but all i could found is that i need to us…

Getting a view does not return a valid response error message on my flask chatbot [duplicate]

This question already has answers here:Flask view return error "View function did not return a response"(3 answers)Closed 3 years ago.Trying to create a whatsapp bot on Twilio that limits the…