Question 1

I've been able to isolate a row in a html table using Beautiful Soup in Python 2.7. Been a learning experience, but happy to get that far. Unfortunately I'm a bit stuck on this next bit.

I need to get the link that follows the "Select document Remittance Report I format XLS" input. As this can change order of appearance, it needs to be dynamic. I'm not sure how to find that input and then grab the link that follows it.

I've been trying some findAll and nextSibling methods but my inexperience with python and beautiful soup is holding me back. The BeautifulSoup documentation is great but going a bit over my head.

<tr class="odd"><td header="c1">Report Download</td><td header="c2"><input aria-label="Select Report format PDF" id="documentChkBx0" name="documentChkBx" type="checkbox" value="5446"/><a href="/a/document.html?key=5446"><img alt="Portable Document Format" src="/img/icons/icon_PDF.gif"></img></a><input aria-label="Select Report format XLS" id="documentChkBx1" name="documentChkBx" type="checkbox" value="5447"/><a href="/a/document.html?key=5447"><img alt="Excel Spreadsheet Format" src="/img/icons/icon_XLS.gif"></img></a></td><td header="c4">04/27/2015</td><td header="c5">05/26/2015</td><td header="c6">05/26/2015 10:00AM EDT</td>
</tr>

Question 2

Locate the input by checking aria-label attribute and get the following a sibling element:

label = soup.find("input", {"aria-label": "Select Report format XLS"})
link = label.find_next_sibling("a", href=True)["href"]

Parsing table for a link

Related Q&A

build matrix from blocks

How to convert string with UTC offset

regex multiline not working on repeated patterns

Django Rest Framework slug_field error

Django - 500 internal server error no module named django

Unable to connect to Google Bigtable using HBase REST api

HTML form button to run PHP to execute Python script

Python - finding time slots

ReportLab - error when creating a table

Secure login with Python credentials from user database