Separating tag attributes as a dictionary

2024/11/14 13:14:20

My entry (The variable is of string type):

<a href="https://wikipedia.org/" rel="nofollow ugc">wiki</a>

My expected output:

{
'href': 'https://wikipedia.org/',
'rel': 'nofollow ugc',
'text': 'wiki',
}

How can I do this with Python? Without using beautifulsoup Library
Please tell with the help of lxml library

Answer

Solution with lxml (but without bs!):

from lxml import etreexml = '<a href="https://wikipedia.org/" rel="nofollow ugc">wiki</a>'
root = etree.fromstring(xml)
print(root.attrib)>>> {'href': 'https://wikipedia.org/', 'rel': 'nofollow ugc'}

But there's no text attribute. You can extract it by using text property:

print(root.text)
>>> 'wiki'

To conclusion:

from lxml import etreexml = '<a href="https://wikipedia.org/" rel="nofollow ugc">wiki</a>'
root = etree.fromstring(xml)
dict_ = {}
dict_.update(root.attrib)
dict_.update({'text': root.text})
print(dict_)
>>> {'href': 'https://wikipedia.org/', 'rel': 'nofollow ugc', 'text': 'wiki'}

EDIT

-------regex parsing [X]HTML is deprecated!-------

Solution with regex:

import re
pattern_text = r"[>](\w+)[<]"
pattern_href = r'href="(\w\S+)"'
pattern_rel = r'rel="([A-z ]+)"'xml = '<a href="https://wikipedia.org/" rel="nofollow ugc">wiki</a>'
dict_ = {'href': re.search(pattern_href, xml).group(1),'rel': re.search(pattern_rel, xml).group(1),'text': re.search(pattern_text, xml).group(1)
}
print(dict_)>>> {'href': 'https://wikipedia.org/', 'rel': 'nofollow ugc', 'text': 'wiki'}

It will work if input is string.

https://en.xdnf.cn/q/119462.html

Related Q&A

How can I remove duplication of 2 separate which is interrelated with each other (PYTHON)

After reading so many title, I couldnt solved the problem below. Does anyone can help me please ? For instance, I have 2 list (list_I and list_II) which is interrelated with each other. list_I = [123,…

Array within an array?

Im trying to call up an element from an array within an array in Python. For example:array = [[a1,a2,a3,a4], [b1,b2,b3,b4], [c1,c2,c3,c4]]The question is, how would I print just the value b1?

How to create a zoned of gradation area on the edge of ROI in opencv python

I have a binary image (white and black), the where Region of Interest (ROI) is black. The shape of ROI is irregular and the location of ROI can be anywhere in the frame. I want to have a smooth gradati…

Prevent Terminal resize python curses

Im writing a program on python curses and I was wondering if there is a way to block terminal resizing in order to prevent curses crashing both on Linux and Windows. This is what happens.. Can I preven…

SymPy Not Doesnt Return LaTeX

Helloo! So, Im using SymPy to make a calculation for me. The trouble is, its output should be a LaTeX expression and in make case it prints something like SymPy Calculation Output Is there any way to s…

Python Flask: How to include JavaScript file for each template per blueprint

I have read Loading external script with jinja2 template directive and Import javascript files with jinja from static folder but unfortunately no closer I have a Python Flask site which is based on htt…

Difference between multiple elements in list with same string . Python 2.7 [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.Questions asking for code must demonstrate a minimal understanding of the problem being solved. Incl…

EDX Course API: Getting EDX course list

I am making a project in python/flask. I want to get a list of all the courses of edx. But the API provides the list page by page. I cant figure out how to get the entire list. Any help is appreciated.…

How to extract particlar message from a vast displayed output using python regular expression?

Firstly in the code, i would like to know How can i add a for loop for CH (1-11) instead of writing for every number Also how to extract SUCCESS and FAILED message from the output (reference) For examp…

Enemy health bar aint draining pygame [duplicate]

This question already has answers here:How to put a health bar over the sprite in pygame(2 answers)Closed 3 years ago.Okay so I was trying to make a health bar for my enemy class and only a part of it …