Counting word occurrences in csv and determine row appearances

2024/10/12 14:48:01

I have a csv file such as the following in one column. The symbols and numbers are only to show that the file does not just contain text. I have two objectives:

  1. count the number of occurrences of a word;
  2. determine how many rows a word appears in.

Stuff
I like apples. Sally likes apples.
Jim has 4 berries.  !@#
John has 2 apples.

Ideally, the code should return something like: {apples: 3} {# of rows: 2}

I've written some code to try and count occurrences, but it isn't running properly (assumedly because of the punctuation). Also, I do not know how to determine the number of rows a word appears in; this could be as simple as counting the number of unique occurrences in each row, but I'm unsure of how to proceed. Here is the code I have so far, done in Python 3.6.1:

import csv
my_reader = csv.reader(open('file.csv', encoding = 'utf-8'))
ctr = 0
for record in my_reader:if record[0] == 'apples':ctr += 1
print(ctr)

The code merely returns 0 as the answer. Help?

Answer

You are comparing if the row == 'apple, what you need is if 'apple' in row. And to count the occurrences you can use str.count(), for example:

import csv
my_reader = csv.reader(open('file.csv', encoding = 'utf-8'))
ctr = 0
rows = 0
for record in my_reader:if 'apples' in record[0]:rows += 1ctr += record[0].count('apples')print('apples: {}, rows: {}'.format(ctr, rows))

This way you will check if the row contains apples then you increment rows by one and increment ctr by number of apples in that row.

https://en.xdnf.cn/q/118083.html

Related Q&A

Converting Python 3.6 script to .exe? [duplicate]

This question already has answers here:How can I convert a .py to .exe for Python?(8 answers)Closed 6 years ago.I would like to convert a .py file to an .exe. I am using Python 3.6. I already tried py…

Error while reading csv file in python

I tried to run following programme in ubuntu terminal but I am getting some error. But it is not giving any error in jupyter notebookFile "imsl.py", line 5 SyntaxError: Non-ASCII character \x…

Searching for a USB in Python is returning there is no disk in drive

I wrote up a function in Python that looks for a USB drive based on a key identifier file, however when called upon it returns There is no disk in the drive. Please insert a disk into drive D:/ (which …

Counting phrases in Python using NLTK

I am trying to get a phrase count from a text file but so far I am only able to obtain a word count (see below). I need to extend this logic to count the number of times a two-word phrase appears in th…

Break python list into multiple lists, shuffle each lists separately [duplicate]

This question already has answers here:Shuffling a list of objects [duplicate](26 answers)Closed 7 years ago.Lets say I have posts in ordered list according to their date.[<Post: 6>, <Post: 5&…

AlterField on auto generated _ptr field in migration causes FieldError

I have two models:# app1 class ParentModel(models.Model):# some fieldsNow, in another app, I have child model:# app2 from app1.models import ParentModelclass ChildModel(ParentModel):# some fields here …

How do I replace values in 2D numpy array using a dictionary of {value:(row#,column#)} pairs

import numpy as npthe array looks like so:array = np.zeros((10,10))array = [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][ 0…

Processing items with Scrapy pipeline

Im running Scrapy from a Python script.I was told that in Scrapy, responses are built in parse()and further processed in pipeline.py. This is how my framework is set so far:Python scriptdef script(self…

How to click a button to vote with python

Im practicing with web scraping in python. Id like to press a button on a site that votes an item. Here is the code<html> <head></head> <body role="document"> <div …

Python 2.7 connection to Oracle: loosing (Polish) characters

I connect from Python 2.7 to Oracle data base. When I use:cursor.execute("SELECT column1 FROM table").fetchall()]I have got almost proper values for column1 because all Polish characters (&qu…