Extracting variables from Javascript inside HTML

2024/11/10 13:04:03

I need all the lines which contains the text '.mp4'. The Html file has no tag!

My code:

import urllib.request
import demjson
url = ('https://myurl')
content = urllib.request.urlopen(url).read()

<script type="text/javascript">
/* <![CDATA[ */
function getEmbed(width, height) {
if (width && height) {
return '<iframe width="' + width + '" height="' + height + '" src="https://www.ptrex.com/embed/33247" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>';
}
return '<iframe width="768" height="432" src="https://www.ptrex.com/embed/33247" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>';
}
var flashvars = {
video_id: '33247', 																	license_code: '$535555517668914', 																	rnd: '1537972655', 																	video_url: 'https://www.ptrex.com/get_file/4/996a9088fdf801992d24457cd51469f3f7aaaee6a0/33000/33247/33247.mp4/', 																	postfix: '.mp4', 																	video_url_text: '480p', 																	video_alt_url: 'https://www.ptrex.com/get_file/4/774833c428771edee2cf401ef2264e746a06f9f370/33000/33247/33247_720p.mp4/', 																	video_alt_url_text: '720p HD', 																	video_alt_url_hd: '1', 																	timeline_screens_url: '//di-iu49il1z.leasewebultracdn.com/contents/videos_screenshots/33000/33247/timelines/timeline_mp4/200x116/{time}.jpg', 																	timeline_screens_interval: '10', 																	preview_url: '//di-iu49il1z.leasewebultracdn.com/contents/videos_screenshots/33000/33247/preview.mp4.jpg', 																	skin: 'youtube.css', 																	bt: '1', 																	volume: '1', 																	hide_controlbar: '1', 																	hide_style: 'fade', 																	related_src: 'https://www.ptrex.com/related_videos_html/33247/', 																	adv_pre_vast: 'https://pt.ptawe.com/vast/v3?psid=ed_pntrexvb1&utm_source=bf1&utm_medium=network&ms_notrack=1', 																	lrcv: '1556867449254522707330811', 																	adv_pre_replay_after: '2', 																	embed: '1'															};
var player_obj = kt_player('kt_player', 'https://www.ptrex.com/player/kt_player.swf?v=4.0.2', '100%', '100%', flashvars);
/* ]]> */
</script>

Answer

You could use BeautifulSoup to extract the <script> tag, but you would still need an alternative approach to extract the information inside.

Some Python can be used to first extract flashvars and then pass this to demjson to convert the Javascript dictionary into a Python one. For example:

import demjsoncontent = """<script type="text/javascript">/* <![CDATA[ */ 
... 
...
</script>"""script_var = content.split('var flashvars = ')[1]
script_var = script_var[:script_var.find('};') + 1]
data = demjson.decode(script_var)print(data['video_url'])
print(data['video_alt_url'])

This would then display:

https://www.ptrex.com/get_file/4/996a9088fdf801992d24457cd51469f3f7aaaee6a0/33000/33247/33247.mp4/
https://www.ptrex.com/get_file/4/774833c428771edee2cf401ef2264e746a06f9f370/33000/33247/33247_720p.mp4/

demjson is an alternative JSON decoder which can be installed via PIP

pip install demjson
https://en.xdnf.cn/q/119775.html

Related Q&A

Pygame, self is not defined [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

Python 3- assigns grades [duplicate]

This question already has answers here:Python 3- Assign grade(2 answers)Closed 8 years ago.• Define a function to prompt the user to enter valid scores until they enter a sentinel value -999. Have …

how to read video data from memory use pyqt5

i have an encrypted video file, i want to decrypt this file into memory and then use this data play video. but qt mediaplayer class is to pass a file name in, i need to have any good way?this is my co…

Pandas apply custom function to DF

I would like to create a brand new data frame by replacing values of a DF using a custom function. I keep getting the following error "ValueError: The truth value of a Series is ambiguous. Use a.e…

Economy Bot Daily Streak

I have a Discord.py economy bot that includes a daily command It gives everyone each day $50, but there is also a streak system. The first time they claim their daily, the bot gives them $50, day 2 is …

Normalise JSON with Python

Prompt me, please, how to normalize this JSON file using Python? This question is related to the previous The current JSON contains: {"total_stats": [{"domain": "domain.com&qu…

How to change decimal separator?

I have an Excel spreadsheet (an extract from SAP). I turn this into a DataFrame, do calculations and save it to an SQLite database. The Excel spreadsheet has comma as decimal separator. The SQLite data…

Unable to scrape the name from the inner page of each result using requests

Ive created a script in python making use of post http requests to get the search results from a webpage. To populate the results, it is necessary to click on the fields sequentially shown here. Now a …

Python Integer and String Using [duplicate]

This question already has an answer here:How can I concatenate str and int objects?(1 answer)Closed 7 years ago.for size in [1, 2, 3, 4]:result = 0print("size=" + str(size))for element in ra…

Beginner to python: Lists, Tuples, Dictionaries, Sets [duplicate]

This question already has an answer here:What is the difference between lists,tuples,sets and dictionaries? [closed](1 answer)Closed 3 years ago.I have been trying to understand what the difference is…