Pro-Football-Reference Team Stats XPath

2024/10/14 14:18:20

I am using the scrapy shell on this page Pittsburgh Steelers at New England Patriots - September 10th, 2015 to pull individual team stats. For example, I want to pull total yards for the away team (464) which, when inspecting the element and copying the XPath yields

//*[@id="team_stats"]/tbody/tr[5]/td[1]

but when I run

response.xpath('//*[@id="team_stats"]/tbody/tr[5]/td[1]')

nothing is returned. I noticed that this table is in a separate div from the initial data so I'm not sure if I need to be starting higher up. Even just a search on the

//*[@id="team_stats"]

xpath returns nothing. Any help would be greatly appreciated.

Answer

The problem you encounter is (as in most of cases like this) that the website uses JavaScript to render the complete information of the game. This means that Scrapy does not see the website as you see it when you open it in your browser.

Because Scrapy does not run any JavaScript after loading the page it does not render out the right table with the ID team_stats. The contents of the "Team Stats" table are there in the loaded website however they are commented out.

One solution would be to extract the comment which contains the team statistics and convert that comment text to HTML and extract the data found there.

response.xpath('//div[@id="all_team_stats"]//comment()').extract()

The text above extracts the comments which contains your required table.

For future analysis I recommend you to use Chrome's Developer Tools where you can disable JavaScript for analyzing sites and load the site with that option. This will return the page's content as Scrapy would see it.

EDIT

After you extract the comment you can feed it into a new selector just like Markus mentioned in his comment:

new_selector = Selector(text=extracted_text)

And on this new selector you can use again .xpath() as you would do on the response object.

Removing the comment delimiter is easy: you have to remove it from the beginning and from the end of the extracted text which is a string. And comments in HTML start with <!-- and end with -->. You need to feed the text between these characters to the new selector.

Extending the example from above:

extracted_text = response.xpath('//div[@id="all_team_stats"]//comment()').extract()[0]
new_selector = Selector(text=extracted_text[4:-3].strip())
new_selector.xpath('//*[@id="team_stats"]/tbody/tr[5]/td[1]').extract()
https://en.xdnf.cn/q/117947.html

Related Q&A

How to delete the last item of a collection in mongodb

I made a program with python and mongodb to do some diaries. Like thisSometimes I want to delete the last sentence, just by typing "delete!" But I dont know how to delete in a samrt way. I do…

Python+kivy+SQLite: How to set label initial value and how to update label text?

everyone,I want to use kivy+Python to display items from a db file. To this purpose I have asked a question before: Python+kivy+SQLite: How to use them together The App in the link contains one screen.…

how to debug ModelMultipleChoiceField [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 7 years ago.Improve…

Standardization/preprocessing for 4-dimensional array

Id like to standardize my data to zero mean and std = 1. The shape of my data is 28783x4x24x7, and it can thought of as 28783 images with 4 channels and dimensions 24x7. The channels need to be standar…

My Python number guessing game

I have been trying to make a number guessing game for Python and so far it has altogether gone quite well. But what keeps bugging me is that it resets the number on every guess so that it is different,…

Class that takes another class as argument, copies behavior

Id like to create a class in Python that takes a single argument in the constructor, another Python class. The instance of the Copy class should have all the attributes and methods of the original clas…

Simple python script to get a libreoffice base field and play on vlc

Ive banged my head for hours on this one, and I dont understand the LibreOffice macro api well enough to know how to make this work:1) This script works in python:#!/usr/bin/env python3 import subproce…

Print month using the month and day

I need to print month using the month and day. But I cannot seem to move the numbers after 1 to the next line using Python.# This program shows example of "November" as month and "Sunday…

Maya: Defer a script until after VRay is registered?

Im trying to delay a part of my pipeline tool (which runs during the startup of Maya) to run after VRay has been registered. Im currently delaying the initialization of the tool in a userSetup.py like…

Optimization on Python list comprehension

[getattr(x, contact_field_map[communication_type])for x in curr_role_group.contacts ifgetattr(x, contact_field_map[communication_type])]The above is my list comprehension. The initial function and the …