writing and saving CSV file from scraping data using python and Beautifulsoup4

2024/9/21 20:22:57

I am trying to scrape data from the PGA.com website to get a table of all of the golf courses in the United States. In my CSV table I want to include the Name of the golf course ,Address ,Ownership ,Website , Phone number. With this data I would like to geocode it and place into a map and have a local copy on my computer

I utilized Python and Beautiful Soup4 to extract my data. I have reached as far to extract the data from the website but I am having difficulty on writing the script to export the data into a CSV file displaying the parameters I need.

Attached below is my script. I need help on creating code that will transfer my extracted code into a CSV file and how to save it into my desktop.

Here is my script below:

import csv
import requests 
from bs4 import BeautifulSoup
url = "http://www.pga.com/golf-courses/search?searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"
r = requests.get(url)soup = BeautifulSoup(r.content)g_data1=soup.find_all("div",{"class":"views-field-nothing-1"})
g_data2=soup.find_all("div",{"class":"views-field-nothing"})for item in g_data1:try:print item.contents[1].find_all("div",{"class":"views-field-counter"})[0].textexcept:pass  try:print item.contents[1].find_all("div",{"class":"views-field-course-type"})[0].textexcept:passfor item in g_data2:try:print item.contents[1].find_all("div",{"class":"views-field-title"})[0].textexcept:passtry:print item.contents[1].find_all("div",{"class":"views-field-address"})[0].textexcept:passtry:print item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].textexcept:pass

This is what I currently get when I run the script. I want to take this data and make into a CSV table for geocoding later.

1801 Merrimac Trl
Williamsburg, Virginia 23185-590512551 Glades Rd
Boca Raton, Florida 33498-6830
Preserve Golf Club 
13601 SW 115th Ave
Dunnellon, Florida 34432-5621
1000 Acres Ranch Resort 
465 Warrensburg Rd
Stony Creek, New York 12878-1613
1757 Golf Club 
45120 Waxpool Rd
Dulles, Virginia 20166-6923
27 Pines Golf Course 
5611 Silverdale Rd
Sturgeon Bay, Wisconsin 54235-8308
3 Creek Ranch Golf Club 
2625 S Park Loop Rd
Jackson, Wyoming 83001-9473
3 Lakes Golf Course 
6700 Saltsburg Rd
Pittsburgh, Pennsylvania 15235-2130
3 Par At Four Points 
8110 Aero Dr
San Diego, California 92123-1715
3 Parks Fairways 
3841 N Florence Blvd
Florence, Arizona 85132
3-30 Golf & Country Club 
101 Country Club Lane
Lowden, Iowa 52255
401 Par Golf 
5715 Fayetteville Rd
Raleigh, North Carolina 27603-4525
93 Golf Ranch 
406 E 200 S
Jerome, Idaho 83338-6731
A 1 Golf Center 
1805 East Highway 30
Rockwall, Texas 75087
A H Blank Municipal Course 
808 County Line Rd
Des Moines, Iowa 50320-6706
A-Bar-A Ranch Golf Course 
Highway 230
Encampment, Wyoming 82325
A-Ga-Ming Golf Resort, Sundance 
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A-Ga-Ming Golf Resort, Torch 
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A. C. Read Golf Club, Bayou 
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508
A. C. Read Golf Club, Bayview 
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508
Answer

All you really need to do here is put your output in a list and then use the CSV library to export it. I'm not entirely clear on what you are getting out views-field-nothing-1 but to just focus on view-fields-nothing, you could do something like:

courses_list=[]for item in g_data2:try:name=item.contents[1].find_all("div",{"class":"views-field-title"})[0].textexcept:name=''try:address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].textexcept:address1=''try:address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].textexcept:address2=''course=[name,address1,address2]courses_list.append(course)

This will put the courses in a list, next you can write them to a cvs like so:

import csvwith open ('filename.cv','wb') as file:writer=csv.writer(file)for row in course_list:writer.writerow(row)
https://en.xdnf.cn/q/72024.html

Related Q&A

Performance issue turning rows with start - end into a dataframe with TimeIndex

I have a large dataset where each line represents the value of a certain type (think a sensor) for a time interval (between start and end). It looks like this: start end type value 2015-01-01…

How can I create a key using RSA/ECB/PKCS1Padding in python?

I am struggling to find any method of using RSA in ECB mode with PKCS1 padding in python. Ive looked into pyCrypto, but they dont have PKCS1 padding in the master branch (but do in a patch). Neverthel…

Do full-outer-join with pandas.merge_asof

Hi I need to align some time series data with nearest timestamps, so I think pandas.merge_asof could be a good candidate. However, it does not have an option to set how=outer like in the standard merge…

order of calling constructors in Python

#!/usr/bin/pythonclass Parent(object): # define parent classparentAttr = 100def __init__(self):print "Calling parent constructor"def parentMethod(self):print Calling parent methoddef s…

How do I access data from a python thread

I have a very simple threading example using Python 3.4.2. In this example I am creating a five threads that just returns the character string "Result" and appends it to an array titled thre…

How to tell if a full-screen application is running?

Is it possible in python to tell if a full screen application on linux is running? I have a feeling it might be possible using Xlib but I havent found a way.EDIT: By full screen I mean the WHOLE scree…

Pretty printers for maps throwing a type error

Ive configured pretty printers using http://wiki.eclipse.org/CDT/User/FAQ#How_can_I_inspect_the_contents_of_STL_containers.3F. It successfully works for vector and other containers. However I cant get …

Return PDF generated with FPDF in Flask

I can generate a PDF with an image using the code below. How can I return the generated PDF from a Flask route?from fpdf import FPDF pdf = FPDF() img = input(enter file name) g = img + .jpg pdf.add_p…

Tensorflow not found on pip install inside Docker Container using Mac M1

Im trying to run some projects using the new Mac M1. Those projects already work on Intel processor and are used by other developers that use Intel. I am not able to build this simple Dockerfile: FROM …

Fast fuse of close points in a numpy-2d (vectorized)

I have a question similar to the question asked here: simple way of fusing a few close points. I want to replace points that are located close to each other with the average of their coordinates. The c…