Javascript variable with html code regex email matching

2024/7/6 22:31:01

This python script is not working to output the email address [email protected] for this case.

This was my previous post.

How can I use BeautifulSoup or Slimit on a site to output the email address from a javascript variable

#!/usr/bin/env pythonfrom bs4 import BeautifulSoup
import resoup = '''
<script LANGUAGE="JavaScript">
function something()
{
var ptr;
ptr = "";
ptr += "<table><td class=france></td></table>";
ptr += "<table><td class=france><a href=mail";
ptr += "to:[email protected]>email</a></td></table>";
document.all.something.innerHTML = ptr;
}
</script>
'''soup = BeautifulSoup(soup)for script in soup.find_all('script'):reg = '(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)'reg2 = 'mailto:.*'secondHalf= re.search(reg, script.text)firstHalf= re.search(reg2, script.text)secondHalfEmail = secondHalf.group()firstHalfEmail = firstHalf.group()firstHalfEmail = firstHalfEmail.replace('mailto:', '')firstHalfEmail = firstHalfEmail.replace('";', '')if firstHalfEmail == secondHalfEmail:email = secondHalfEmailelse:if ('>') not in firstHalfEmail:if ('>') not in secondHalfEmail:if firstHalfEmail != secondHalfEmail:email = firstHalfEmail + secondHalfEmailelse:email = firstHalfEmailelse:email = secondHalfEmailprint email

It would be nice if someone can help me.

Thank you

Answer

Here is a rather interesting (I think) approach.

Instead of parsing this javascript code - execute it!

Get the ptr value, load it via BeautifulSoup and get the href attribute value from the a tag. Example using V8 engine:

from bs4 import BeautifulSoup
from pyv8 import PyV8data = """
<script LANGUAGE="JavaScript">
function something()
{
var ptr;
ptr = "";
ptr += "<table><td class=france></td></table>";
ptr += "<table><td class=france><a href=mail";
ptr += "to:[email protected]>email</a></td></table>";
document.all.something.innerHTML = ptr;
}
</script>
"""soup = BeautifulSoup(data)# prepare the function to return a value and add a function call
js_code = soup.script.text.strip().replace('document.all.something.innerHTML = ptr;', 'return ptr;') + "; something()"ctxt = PyV8.JSContext()
ctxt.enter()soup = BeautifulSoup(ctxt.eval(str(js_code)))
print soup.a['href'].split('mailto:')[1]

Prints:

[email protected]
https://en.xdnf.cn/q/120562.html

Related Q&A

how to send a large array over tcp socket in python? is it possible to send? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 9 years ago.Improve…

Expand Python regex to list of all possible strings [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 5…

How can I group and sum a pandas dataframe? [duplicate]

This question already has answers here:How do I Pandas group-by to get sum?(11 answers)Closed 1 year ago.Ive had a good hunt for some time and cant find a solution so asking here. I have data like so:…

python add value to a list when iterate the list

values = [2,3,4] for v in values:values.append([v,255,255])Why do the statements above never end? I make a mistake in my code. However, I find it will never stop when I execute the code above.

resizing images from 64x64 to 224x224 for the VGG model

Can we resize an image from 64x64 to 256x256 without affecting the resolution is that a way to add zero on new row and column in the new resized output I m working on vgg and I get an error while addin…

Python slicing explained [duplicate]

This question already has answers here:How slicing in Python works(38 answers)Closed 6 years ago.OK I understand the basics, but can someone explain code copied from Gregs answer here:a[1::-1] # the …

Comparison between string characters within a list [duplicate]

This question already has an answer here:How to compare characters of strings that are elements of a list? [duplicate](1 answer)Closed 2 years ago.Having a Python list, containing same length strings,…

Pyspark filling missing dates by group and filling previous values

Spark version 3.0. I have two dataframes. I create one dataframe with date columns using pandas date range. I have a 2nd spark dataframe contains the company name, dates and value. I want to merge the …

How to loop in the opposite order?

I am a beginner programmer. Here is my code:n = int(input()) from math import* for i in range(n):print(n, "\t", log10(n))i = i + 1n = n - 1Its output is:10 1.0 9 0.9542425094393249 8 …

Override methods with same name in Python programming [duplicate]

This question already has answers here:Closed 12 years ago.Possible Duplicate: How do I use method overloading in Python?I am new to Python programming, and I like to write multiple methods with the …