I have weird list of items and lists like this with |
as a delimiters and [[ ]]
as a parenthesis. It looks like this:
| item1 | item2 | item3 | Ulist1[[ | item4 | item5 | Ulist2[[ | item6 | item7 ]] | item8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14
I want to match items in lists called Ulist*
(items 4-8) using RegEx and replace them with Uitem*
. The result should look like this:
| item1 | item2 | item3 | Ulist1[[ | Uitem4 | Uitem5 | Ulist2[[ | Uitem6 | Uitem7 ]] | Uitem8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14
I tryied almost everything I know about RegEx, but I haven't found any RegEx matching each item inside if the Ulists. My current RegEx:
/Ulist(\d+)\[\[(\s*(\|\s*[^\s\|]*)*\s*)*\]\]/i
What is wrong? I am beginner with RegEx.
It is in Python 2.7, specifically my code is:
def fixDirtyLists(self, text):text = textlib.replaceExcept(text, r'Ulist(\d+)\[\[(\s*(\|\s*[^\s\|]*)*\s*)*\]\]', r'Ulist\1[[ U\3 ]]', '', site=self.site)return text
text
gets that weird list, textlib
replaces RegEx with RegEx. Not complicated at all.
If you install PyPi regex module (with Python 2.7.9+ it can be done by a mere pip install regex
when in \Python27\Scripts\
folder), you will be able to match nested square brackets. You can match the strings you need, replace item
with Uitem
inside only those substrings.
The pattern (see demo, note that PyPi regex recursion resembles that of PCRE):
(Ulist\d+)(\[\[(?>[^][]|](?!])|\[(?!\[)|(?2))*]])
^-Group1-^^-----------Group2--------------------^
A short explanation: (Ulist\d+)
is Group 1 that matches a literal word Ulist
followed by 1 or more digits followed by (\[\[(?>[^][]|](?!])|\[(?!\[)|(?2))*]])
that matches substrings starting with [[
up to the corresponding ]]
.
And the Python code:
>>> import regex
>>> s = "| item1 | item2 | item3 | Ulist1[[ | item4 | item5 | Ulist2[[ | item6 | item7 ]] | item8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14"
>>> pat = r'(Ulist\d+)(\[\[(?>[^][]|](?!])|\[(?!\[)|(?2))*]])'
>>> res = regex.sub(pat, lambda m: m.group(1) + m.group(2).replace("item", "Uitem"), s)
>>> print(res)
| item1 | item2 | item3 | Ulist1[[ | Uitem4 | Uitem5 | Ulist2[[ | Uitem6 | Uitem7 ]] | Uitem8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14
To avoid modifying list
s inside Ulist
, use
def repl(m):return "".join([x.replace("item", "Uitem") if not x.startswith("list") else x for x in regex.split(r'\blist\d*\[{2}[^\]]*(?:](?!])[^\]]*)*]]', m.group(0))])
and replace the regex.sub
with
res = regex.sub(pat, repl, s)