Exact string search in XML files?

2024/11/14 18:46:14

I need to search into some XML files (all of them have the same name, pom.xml) for the following text sequence exactly (also in subfolders), so in case somebody write some text or even a blank, I must get an alert:

     <!--| Startsection|-->         <!-- | Endsection|-->

I'm running the following Python script, but still not matching exactly, I also get alert even when it's partially the text inside:

import re
import os
from os.path import join
comment=re.compile(r"<!--\s+| Startsection\s+|-->\s+<!--\s+| Endsection\s+|-->")
tag="<module>"for root, dirs, files in os.walk("."):if "pom.xml" in files:p=join(root, "pom.xml") print("Checking",p)with open(p) as f:s=f.read()if tag in s and comment.search(s):print("Matched",p)

UPDATE #3

I am expecting to print out, the content of tag <module> if it exists between |--> <!--

into the search:

 <!--| Startsection|-->         <!-- | Endsection|-->

for instance print after Matched , and the name of the file, also print "example.test1" in the case below :

     <!--| Startsection|-->         <module>example.test1</module><!-- | Endsection|-->

UPDATE #4

Should be using the following :

import re
import os
from os.path import join
comment=re.compile(r"<!--\s+\| Startsection\s+\|-->\s+<!--\s+\| Endsection\s+\|-->", re.MULTILINE)
tag="<module>"for root, dirs, files in os.walk("/home/temp/test_folder/"):for skipped in ("test1", "test2", ".repotest"):if skipped in dirs: dirs.remove(skipped)if "pom.xml" in files:p=join(root, "pom.xml") print("Checking",p)with open(p) as f:s=f.read()if tag in s and comment.search(s):print("The following files are corrupted ",p)

UPDATE #5

import re
import os
import xml.etree.ElementTree as etree 
from bs4 import BeautifulSoup 
from bs4 import Commentfrom os.path import join
comment=re.compile(r"<!--\s+\| Startsection\s+\|-->\s+<!--\s+\| Endsection\s+\|-->", re.MULTILINE)
tag="<module>"for root, dirs, files in os.walk("myfolder"):for skipped in ("model", "doc"):if skipped in dirs: dirs.remove(skipped)if "pom.xml" in files:p=join(root, "pom.xml") print("Checking",p)with open(p) as f:s=f.read()if tag in s and comment.search(s):print("ERROR: The following file are corrupted",p)bs = BeautifulSoup(open(p), "html.parser")
# Extract all comments
comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for c in comments:# Check if it's the start of the codeif "Start of user code" in c:modules = [m for m in c.findNextSiblings(name='module')]for mod in modules:print(mod.text)
Answer

Don't parse a XML file with regular expression. The best Stackoverflow answer ever can explain you why

You can use BeautifulSoup to help on that task

Look how simple would be extract something from your code

from bs4 import BeautifulSoupcontent = """<!--| Start of user code (user defined modules)|--><!--| End of user code|-->
"""bs = BeautifulSoup(content, "html.parser")
print(''.join(bs.contents))

Of course you can use your xml file instead of the literal I'm using

bs = BeautifulSoup(open("pom.xml"), "html.parser")

A small example using your expected input

from bs4 import BeautifulSoup
from bs4 import Commentbs = BeautifulSoup(open(p), "html.parser")
# Extract all comments
comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for c in comments:# Check if it's the start of the codeif "Start of user code" in c:modules = [m for m in c.findNextSiblings(name='module')]for mod in modules:print(mod.text)

But if your code is always in a module tag I don't know why you should care about the comments before/after, you can just find the code inside the module tag directly

https://en.xdnf.cn/q/119094.html

Related Q&A

Integrate a function by the trapezoidal rule- Python

Here is the homework assignment Im trying to solve:A further improvement of the approximate integration method from the last question is to divide the area under the f(x) curve into n equally-spaced tr…

Kivy module not found in vscode (Mac)

I have installed Kivy and when I used the IDLE app that came with Python I can import it and it runs perfectly. However, when I try to import it in vscode I get the error: ModuleNotFoundError: No modul…

How to get latest unique entries from sqlite db with the counter of entries via Django ORM

I have a SQLite db which looks like this:|ID|DateTime|Lang|Details| |1 |16 Oct | GB | GB1 | |2 |15 Oct | GB | GB2 | |3 |17 Oct | ES | ES1 | |4 |13 Oct | ES | ES2 | |5 |15 Oct | ES | ES3 …

What does this code %.8f% do in python? [duplicate]

This question already has answers here:What does % do to strings in Python? [duplicate](4 answers)Closed 6 years ago.I am editing a code line to pass the rate in quotes:OO000OO00O0O0O000 [rate]=O0O0OO…

How to append a selection of a numpy array to an empty numpy array

I have a three .txt files to which I have successfully made into a numpy array. If you are curious these files are Level 2 data from the Advanced Composition Experiment (ACE). The particular files are …

Error saving and loading a list of matrices

I have a list "data_list", and I would save it in order to load it in another script. First of all I converted it in an array, in this way:data_array = np.array(data_list)Then I saved it:np.s…

Trying to interact with HTML page elements, but none of them are found

Im trying to scrape a webpage using Selenium, but when I try to pass the XPath of a button, I get an error saying that this element does not exist. I tried with another website, and it worked perfectly…

Duplicating an XML element and adding it to a specific position in XML file using python

I have a xml file in which content looks like this: xml_content_to_search = <Document ProviderID="TD" DecimalMarker="comma" Website="https://erc-viewer.sap.com/"> &l…

How do I fix this Gets server error, which is causing display issues?

The list in the left column of ontariocourts311.ca, along with the body of the page under the image intermittently fail to display (which is fixed by refreshing the page). Im a Noob, and have tried var…

Installing Scipy for Windows

I am trying to install Scipy on my computer. I did it by using the command pip install Scipy. (pip & numpy are up-to-date and I am using Python 3.6) I also tried it with Pycharm, but it didnt worke…