Question 1

I need to search into some XML files (all of them have the same name, pom.xml) for the following text sequence exactly (also in subfolders), so in case somebody write some text or even a blank, I must get an alert:

     <!--| Startsection|-->         <!-- | Endsection|-->

I'm running the following Python script, but still not matching exactly, I also get alert even when it's partially the text inside:

import re
import os
from os.path import join
comment=re.compile(r"<!--\s+| Startsection\s+|-->\s+<!--\s+| Endsection\s+|-->")
tag="<module>"for root, dirs, files in os.walk("."):if "pom.xml" in files:p=join(root, "pom.xml") print("Checking",p)with open(p) as f:s=f.read()if tag in s and comment.search(s):print("Matched",p)

UPDATE #3

I am expecting to print out, the content of tag <module> if it exists between |--> <!--

into the search:

 <!--| Startsection|-->         <!-- | Endsection|-->

for instance print after Matched , and the name of the file, also print "example.test1" in the case below :

     <!--| Startsection|-->         <module>example.test1</module><!-- | Endsection|-->

UPDATE #4

Should be using the following :

import re
import os
from os.path import join
comment=re.compile(r"<!--\s+\| Startsection\s+\|-->\s+<!--\s+\| Endsection\s+\|-->", re.MULTILINE)
tag="<module>"for root, dirs, files in os.walk("/home/temp/test_folder/"):for skipped in ("test1", "test2", ".repotest"):if skipped in dirs: dirs.remove(skipped)if "pom.xml" in files:p=join(root, "pom.xml") print("Checking",p)with open(p) as f:s=f.read()if tag in s and comment.search(s):print("The following files are corrupted ",p)

UPDATE #5

import re
import os
import xml.etree.ElementTree as etree 
from bs4 import BeautifulSoup 
from bs4 import Commentfrom os.path import join
comment=re.compile(r"<!--\s+\| Startsection\s+\|-->\s+<!--\s+\| Endsection\s+\|-->", re.MULTILINE)
tag="<module>"for root, dirs, files in os.walk("myfolder"):for skipped in ("model", "doc"):if skipped in dirs: dirs.remove(skipped)if "pom.xml" in files:p=join(root, "pom.xml") print("Checking",p)with open(p) as f:s=f.read()if tag in s and comment.search(s):print("ERROR: The following file are corrupted",p)bs = BeautifulSoup(open(p), "html.parser")
# Extract all comments
comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for c in comments:# Check if it's the start of the codeif "Start of user code" in c:modules = [m for m in c.findNextSiblings(name='module')]for mod in modules:print(mod.text)

Question 2

Don't parse a XML file with regular expression. The best Stackoverflow answer ever can explain you why

You can use BeautifulSoup to help on that task

Look how simple would be extract something from your code

from bs4 import BeautifulSoupcontent = """<!--| Start of user code (user defined modules)|--><!--| End of user code|-->
"""bs = BeautifulSoup(content, "html.parser")
print(''.join(bs.contents))

Of course you can use your xml file instead of the literal I'm using

bs = BeautifulSoup(open("pom.xml"), "html.parser")

A small example using your expected input

from bs4 import BeautifulSoup
from bs4 import Commentbs = BeautifulSoup(open(p), "html.parser")
# Extract all comments
comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for c in comments:# Check if it's the start of the codeif "Start of user code" in c:modules = [m for m in c.findNextSiblings(name='module')]for mod in modules:print(mod.text)

But if your code is always in a module tag I don't know why you should care about the comments before/after, you can just find the code inside the module tag directly

Exact string search in XML files?

Related Q&A

Integrate a function by the trapezoidal rule- Python

Kivy module not found in vscode (Mac)

How to get latest unique entries from sqlite db with the counter of entries via Django ORM

What does this code %.8f% do in python? [duplicate]

How to append a selection of a numpy array to an empty numpy array

Error saving and loading a list of matrices

Trying to interact with HTML page elements, but none of them are found

Duplicating an XML element and adding it to a specific position in XML file using python

How do I fix this Gets server error, which is causing display issues?

Installing Scipy for Windows