Scraping data from a dynamic web database with Python [closed]

2024/11/15 23:29:51

I am new to Python and am currently trying to figure out how to scrape data from this web:

https://www.entsoe.eu/db-query/consumption/mhlv-a-specific-country-for-a-specific-month

I am not sure if I use Scrapy, BeautifulSoup or Selenium. Need data for a specific country (say DE - Germany) for each month and day within 2012-2014.

Any help is very much appreciated.

Answer

You can solve it with requests (for maintaining a web-scraping session) + BeautifulSoup (for HTML parsing) + regex for extracting a value of a javascript variable containing the desired data inside a script tag and ast.literal_eval() for making a python list out of js list:

from ast import literal_eval
import refrom bs4 import BeautifulSoup
import requestsurl = "https://www.entsoe.eu/db-query/consumption/mhlv-a-specific-country-for-a-specific-month"
payload = {'opt_period': '0','opt_Country': '12',  # 12 stands for DE here'opt_Month': '1','opt_Year': '2014','opt_Response': '1','send': 'send','opt_period': '0'
}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari/537.36'}with requests.Session() as session:session.headers = headerssession.get(url)response = session.post(url, data=payload)soup = BeautifulSoup(response.content)script = soup.find('script', text=re.compile(r'Ext.onReady')).textdata = literal_eval(re.search(r"var myData = (.*?);", script, re.MULTILINE).group(1))for row in data:print row

Prints:

['DE', '2014-01-01', 45424, 43537, 41773, 40716, 39945, 39014, 37282, 37573, 38225, 40639, 42884, 45332, 46285, 45671, 45293, 45840, 48863, 53721, 54607, 53691, 51219, 49701, 49099, 45850]
['DE', '2014-01-02', 42468, 40217, 39564, 39758, 41054, 43586, 48705, 54691, 58650, 61110, 62773, 64309, 64561, 63807, 62706, 61919, 63338, 66760, 66615, 64653, 60690, 57825, 55697, 51490]
['DE', '2014-01-03', 47538, 45125, 44358, 44748, 45815, 48024, 52151, 57564, 60767, 62425, 63654, 65152, 65273, 63591, 62195, 61722, 63311, 66785, 66668, 64317, 60460, 57727, 56084, 52332]
...
['DE', '2014-01-29', 57605, 55275, 54154, 54226, 55320, 58459, 66647, 73890, 75957, 75958, 76725, 77446, 76852, 76362, 75300, 74549, 73958, 77129, 78240, 76323, 71961, 68595, 66088, 61923]
['DE', '2014-01-30', 58207, 56235, 54953, 54873, 55861, 58952, 66756, 73747, 75479, 75507, 76249, 76763, 76013, 75291, 73975, 73267, 72717, 76181, 77765, 76038, 71807, 68369, 65580, 61414]
['DE', '2014-01-31', 57870, 55665, 54381, 54422, 55419, 58490, 65929, 72706, 74666, 74392, 74791, 74923, 73877, 72205, 70449, 69596, 69345, 73259, 74950, 72959, 68623, 65319, 63414, 59467]

Selenium-specific approach would be less "magical", but I think this is more than enough for you to start (and for a question with minimal research effort).

https://en.xdnf.cn/q/120395.html

Related Q&A

Python representation of floating point numbers [duplicate]

This question already has answers here:Floating Point Limitations [duplicate](3 answers)Closed 10 years ago.I spent an hour today trying to figure out whyreturn abs(val-desired) <= 0.1was occasional…

How to grep only duplicate key:value pair in python dictionary?

I have following python dictionary.a={name:test,age:26,place:world,name:test1}How to grep only duplicate key:value pair from the above?Output should be: "name: test and name:test1"

IndentationError - expected an indented block [duplicate]

This question already has answers here:Im getting an IndentationError (or a TabError). How do I fix it?(6 answers)Closed 7 months ago.I get the IndentationError: expected an indented block. I was tryi…

No axis named 1 for object type class pandas.core.frame.DataFrame

I created a DataFrame and I am trying to sort it based on the columns. I used the below code.frame.sort_index(axis=1)But this is causing the below errors------------------------------------------------…

str.replace with a variable

This is probably a simple fix, but having a little trouble getting my head around it; Im reading lines from a different script, and want to replace a line with a variable, however it replaces it with b…

How to generate DTD from XML?

Can a DTD be generated from an XML file using Python?

I have a very big list of dictionaries and I want to sum the insides

Something like{A: 3, 45, 34, 4, 2, 5, 94, 2139, 230345, 283047, 230847}, {B: 92374, 324, 345, 345, 45879, 34857987, 3457938457), {C: 23874923874987, 2347}How can I reduce that to {A: 2304923094820398},…

How to debug a TypeError no attribute __getitem__? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 10 years ago.Improv…

Change character based off of its position? Python 2.7

I have a string on unknown length, contain the characters a-z A-Z 0-9. I need to change each character using their position from Left to Right using a dictionary.Example:string = "aaaaaaaa" d…

Changing type using str() and int()...how it works

If I do this, I get:>>> x = 1 >>> y = 2 >>> type(x) <class int> >>> type(y) <class str>That all makes sense to me, except that if I convert using:>>…