Question 1

I am very new to Python (and web scraping). Let me ask you a question.

Many website actually do not report its specific URLs in Firefox or other browsers. For example, Social Security Admin shows popular baby names with ranks (since 1880), but the url does not change when I change the year from 1880 to 1881. It is constantly,

http://www.ssa.gov/cgi-bin/popularnames.cgi

Because I don't know the specific URL, I could not download the webpage using urllib.

In this page source, it includes:

<input type="text" name="year" id="yob" size="4" value="1880">

So presumably, if I can control this "year" value (like, "1881" or "1991"), I can deal with this problem. Am I right? I still don't know how to do it.

Can anybody tell me the solution for this please?

If you know some websites that may help my study, please let me know.

THANKS!

Question 2

You can still use urllib. The button performs a POST to the current url. Using Firefox's Firebug I took a look at the network traffic and found they're sending 3 parameters: member, top, and year. You can send the same arguments:

import urllib
url = 'http://www.ssa.gov/cgi-bin/popularnames.cgi'post_params = { # member was blank, so I'm excluding it.'top'  : '25','year' : year}
post_args = urllib.urlencode(post_params)

Now, just send the url-encoded arguments:

urllib.urlopen(url, post_args)

If you need to send headers as well:

headers = {'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Language' : 'en-US,en;q=0.5','Connection' : 'keep-alive','Host' : 'www.ssa.gov','Referer' : 'http://www.ssa.gov/cgi-bin/popularnames.cgi','User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0'}# With POST data:
urllib.urlopen(url, post_args, headers)

Execute the code in a loop:

for year in xrange(1880, 2014):# The above code...

a (presumably basic) web scraping of http://www.ssa.gov/cgi-bin/popularnames.cgi in urllib

Related Q&A

Why is tuple being returned?

How to assert both UserWarning and SystemExit in pytest

Distinguish button_press_event from drag and zoom clicks in matplotlib

String reversal in Python

Python: passing functions as arguments to initialize the methods of an object. Pythonic or not?

Encrypt and Decrypt by AES algorithm in both python and android

How to conditionally assign values to tensor [masking for loss function]?

Assign Colors to Lines

How to display multiple annotations in Seaborn Heatmap cells

ImportError: No module named lxml on Mac