Parsing table for a link

2024/10/12 11:18:59

I've been able to isolate a row in a html table using Beautiful Soup in Python 2.7. Been a learning experience, but happy to get that far. Unfortunately I'm a bit stuck on this next bit.

I need to get the link that follows the "Select document Remittance Report I format XLS" input. As this can change order of appearance, it needs to be dynamic. I'm not sure how to find that input and then grab the link that follows it.

I've been trying some findAll and nextSibling methods but my inexperience with python and beautiful soup is holding me back. The BeautifulSoup documentation is great but going a bit over my head.


<tr class="odd"><td header="c1">Report Download</td><td header="c2"><input aria-label="Select Report format PDF" id="documentChkBx0" name="documentChkBx" type="checkbox" value="5446"/><a href="/a/document.html?key=5446"><img alt="Portable Document Format" src="/img/icons/icon_PDF.gif"></img></a><input aria-label="Select Report format XLS" id="documentChkBx1" name="documentChkBx" type="checkbox" value="5447"/><a href="/a/document.html?key=5447"><img alt="Excel Spreadsheet Format" src="/img/icons/icon_XLS.gif"></img></a></td><td header="c4">04/27/2015</td><td header="c5">05/26/2015</td><td header="c6">05/26/2015 10:00AM EDT</td>
</tr>
Answer

Locate the input by checking aria-label attribute and get the following a sibling element:

label = soup.find("input", {"aria-label": "Select Report format XLS"})
link = label.find_next_sibling("a", href=True)["href"]
https://en.xdnf.cn/q/118205.html

Related Q&A

build matrix from blocks

I have an object which is described by two quantities, A and B (in real case they can be more than two). Objects are correlated depending on the value of A and B. In particular I know the correlation m…

How to convert string with UTC offset

I have date as In [1]: a = "Sun 10 May 2015 13:34:36 -0700"When I try to convert it using strptime, its giving error.In [3]: datetime.strptime(a, "%a %d %b %Y %H:%M:%S %Z"...: ) ---…

regex multiline not working on repeated patterns

I am trying to get a regex expression to match multiple patterns with multilines but it keeps matching everything. For instance I want to match two times this code:STDMETHOD(MyFunc)( D2D1_SIZE_U size, …

Django Rest Framework slug_field error

I have this serializer and model. I want to show in my API the field username of User model, but I receive this error.AttributeError at /api/v1/client_share_requests/1/Profile object has no attribute u…

Django - 500 internal server error no module named django

django return 500 internal server error (apache 2.4.10, ubuntu 15.04, django 1.9.6)apache log:[wsgi:warn] mod_wsgi: Compiled for Python/3.4.2. [wsgi:warn] mod_wsgi: Runtime using Python/3.4.3. [mpm_eve…

Unable to connect to Google Bigtable using HBase REST api

Following this example, running the test script "python put_get_with_client.py" results in a 400 error (Bad Request).Bad requestjava.lang.ClassCastException: org.apache.hadoop.hbase.client.Bi…

HTML form button to run PHP to execute Python script

I am building an HTML document that is meant to run locally. On it is a button that I would like to have run a Python script when clicked. Im trying to use a PHP-generated button. Theres no input or ou…

Python - finding time slots

I am writing a small Python script to find time available slots based off calendar appointments. I was able to reuse the code on the post here: (Python - Algorithm find time slots).It does seem to wor…

ReportLab - error when creating a table

This is the first time Ive used ReportLab, I have tried to edit an existing script that does exactly what I want to do, but I get the following error, when I try and run the script.Script - import os, …

Secure login with Python credentials from user database

I like to create a secure login with Python but need to check the user table from a database, so that multiple users can log in with their own password. Mainly like this, works like a charm but not sec…