I'd like to match the urls like this:
input:
x = "https://play.google.com/store/apps/details?id=com.alibaba.aliexpresshd&hl=en"get_id(x)
output:
com.alibaba.aliexpresshd
What is the best way to do it with re in python?
def get_id(toParse):return re.search('id=(WHAT TO WRITE HERE?)', toParse).groups()[0]
I found only the case with exactly one dot.
You could try:
r'\?id=([a-zA-Z\.]+)'
For your regex, like so:
def get_id(toParse)regex = r'\?id=([a-zA-Z\.]+)'x = re.findall(regex, toParse)[0]return x
Regex -
By adding r
before the actual regex code, we specify that it is a raw string, so we don't have to add multiple backslashes before every command, which is better explained here.
?
holds special meaning for the regex system, so to match a question mark, we precede it by a backslash like \?
id=
matches the id=
part of the extraction
([a-zA-Z\.]+)
is the group(0) of the regex, which matches the id of the URL. Hence, by saying [0]
, we are able to return the desired text.
Note - I have used re.findall
for this, because it returns an array []
whose element at index 0 is the extracted text.
I recommend you take a look at rexegg.com for a full list of regex syntax.