Question 1

I am trying to get large amount of status information, which are encoded in websites, mainly inside the "< head >< /head >" element.

I know I can use wget or curl or python to get the whole page. But I don't want to put too much unnecessary stress to the servers (the pages themself are rather large/complicated).

Is there any method of getting only the head-element?

I assume there is something proxy servers do besides checking for html headers.

Jusk for clarification: I don't search for html-headers, only for html-<head>.

Question 2

It is not possible to load only the data between the <head> tags because the server would have to parse the requested page before sending it.

A possible solution would read a few bytes until a </head> tag is found.

The following reads n bytes from the source and checks if the string </head> is included. If so, the bytes are converted to string and trimmed such that the result contains the tags <head> and </head> as well as the data between them. Otherwise it continues to read n bytes until </head> is found.

import urllib.requestdef get_head_tag_data(url, n=512):"""Read n bytes form source until '</head> is included. Trim result to'<head> ... </head>' and return it as string."""# open resourcewith urllib.request.urlopen(url) as site:# read n bytes until `buff` includes "</head>"data = b''i = 1while True:buff = site.read(n)data += buffif b'</head>' in buff:breakelif buff == b'':raise AttributeError('Not head-tag found.')i += 1print('{} bytes read'.format(n*i))# cast to stringdata = str(data)# detect tag positionstart_tag = data.find('<head>')end_tag = data.find('</head>') + 7return data[start_tag:end_tag]tag_data = get_head_tag_data('https://stackoverflow.com', n=256)

Note that this functions does not check for possible erros, for example if there is no </head> tag.

Get only HTML head Element with a Script or Tool

Related Q&A

Is it possible to restore corrupted “interned” bytes-objects

Wildcard namespaces in lxml

WordNet - What does n and the number represent?

How to change the values of a column based on two conditions in Python

logging module for python reports incorrect timezone under cygwin

Set ordering of Apps and models in Django admin dashboard

python database / sql programming - where to start

How to install Python 3.5 on Raspbian Jessie

Django - last insert id

How to check if default value for python function argument is set using inspect?