How does one reorder information in an XML document in python 3?

2024/10/9 16:27:50

Let's suppose I have the following XML structure:

<?xml version="1.0" encoding="utf-8" ?>
<Document><CstmrCdtTrfInitn><GrpHdr><other_tags>a</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>--><other_tags>b</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>--><other_tags>c</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>--></GrpHdr><PmtInf><things>d</things> <!--here there might be other nested tags inside <things></things>--><things>e</things> <!--here there might be other nested tags inside <things></things>--><CdtTrfTxInf><!-- other nested tags here --></CdtTrfTxInf></PmtInf><PmtInf><things>f</things> <!--here there might be other nested tags inside <things></things>--><things>g</things> <!--here there might be other nested tags inside <things></things>--><CdtTrfTxInf><!-- other nested tags here --></CdtTrfTxInf></PmtInf><PmtInf><things>f</things> <!--here there might be other nested tags inside <things></things>--><things>g</things> <!--here there might be other nested tags inside <things></things>--><CdtTrfTxInf><!-- other nested tags here --></CdtTrfTxInf></PmtInf></CstmrCdtTrfInitn>
</Document>    

Now, given this structure, I want to manipulate the sections as follows:

If there are two or more <PmtInf> tags that have the same:

<things>d</things> <!--here there might be other nested tags inside <things></things>-->
<things>e</things> <!--here there might be other nested tags inside <things></things>-->

I would like to move the whole <CdtTrfTxInf></CdtTrfTxInf> to the first <PmtInf></PmtInf> and remove the whole <PmtInf></PmtInf> that I've taken <CdtTrfTxInf></CdtTrfTxInf> from. A bit, fuzzy, right ? Here is an example:

<Document><CstmrCdtTrfInitn><GrpHdr><other_tags>a</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>--><other_tags>b</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>--><other_tags>c</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>--></GrpHdr><PmtInf><things>d</things> <!--here there might be other nested tags inside <things></things>--><things>e</things> <!--here there might be other nested tags inside <things></things>--><CdtTrfTxInf><!-- other nested tags here --></CdtTrfTxInf></PmtInf><PmtInf><things>f</things> <!--here there might be other nested tags inside <things></things>--><things>g</things> <!--here there might be other nested tags inside <things></things>--><CdtTrfTxInf><!-- other nested tags here --></CdtTrfTxInf><CdtTrfTxInf><!-- other nested tags here --></CdtTrfTxInf></PmtInf></CstmrCdtTrfInitn>
</Document>

As you can see, the last two <PmtInf></PmtInf> tags became now a single one (because <things></matched>) and the <CdtTrfTxInf></CdtTrfTxInf> was copied.

Now, I would like to do this in any possible way (lxml, xml.etree, xslt etc). At first, I thought about using some RegEx to do this, but it might become a bit ugly. Then, I thought I might be able to use some string manipulations but I can't figure a way of how would I do this.

Can somebody tell me what method would be the most elegant / efficient one if the average size of an XML file would be about 2k lines ? An example would also be kindly appreciated.

For the sake of completness, I'll define a function which will return the entire XML content in a string:

def get_xml_from(some_file):with open(some_file) as xml_file:content = xml_file.read()return contentdef modify_xml(some_file):content_of_xml = get_xml_from(some_file)# here I should be able to process the XML filereturn processed_xml

I'm not looking for somebody doing this for me, but asking for ideas on what are the best ways of achieving this.

Answer

I'm not going to give you the code you want. Instead I'll say how you can go about doing what you want.

First things first you want to read your xml. So I'll be using xml.etree.ElementTree.

import xml.etree.ElementTree as ET
root = ET.fromstring(country_data_as_string)

After this I'd ignore the parts of the tree that you don't use, and just find CstmrCdtTrfInitn. As you only want to work with PmtInfs you want to findall of them.

pmt_infs = root.find('.//CstmrCdtTrfInitn').findall('PmtInf')

After this you want to perform your algorithm* to move items on your data. I'll just remove the first child, if the node has one.

nodes = []
for node in pmt_infs:children = list(node)if children:node.remove(children[0])nodes.append(children[0])

Now that we have all the nodes, you'll add them to the first pmt_infs.

pmt_infs[0].extend(nodes)

* You'll want to change the third code block to how you want to move your nodes, as you changed your algorithm from v1 to v3 of your question.

https://en.xdnf.cn/q/118563.html

Related Q&A

Python - Replace only exact word in string [duplicate]

This question already has answers here:How to match a whole word with a regular expression?(4 answers)Closed 4 years ago.I want to replace only specific word in one string. However, some other words h…

How to write Hierarchical query in PYTHON

The given input is like:EMPLOYEE_ID NAME MANAGER_ID101 A 10102 B 1110 C 111 D 11 E nullEmployee Cycle LEVEL Path10…

Unable to launch selenium with python in mac

Im facing an issue with selenium with python in Mac OS.. Python 2.7 pydev 3.0My sample codefrom selenium import webdriver driver = webdriver.Firefox() driver.get("https://www.formsite.com/") …

Memory error In instantiating the numpy array

I have a list A of a 50,000 elements and each element is an array of shape (102400) I tried instantiating an array B.B=numpy.array(A)But this throws an exception MemoryError.I know that the memory and …

Setting column names in a pandas dataframe (Python)

When setting a column name for a pandas dataframe, why does the following work:df_coeff = pd.DataFrame(data = lm.coef_, index = X.columns, columns = [Coefficient])While this does not workdf_coeff = pd.…

Check that Python function does not modify argument?

You know how in Python, if v is a list or a dictionary, its quite common to write functions that modify v in place (instead of just returning the new value). Im wondering if it is possible to write a c…

What Python 3 version for my Django project

I will try to port my Python 2.7 with Django to Python 3. But now my question is what version is the most stable one today? Ive heard people use 3.2 and 3.4 and recommend it. But now Im asking you guy…

Error during runfile in Eclipse with PyDev/ error initializing console

Using a PyDev console in Eclipse, which initially worked fine. Python code would work inside the console. When I started writing a file within a PyDev module, I tried executing runfile() but the consol…

Reversibly encode two large integers of different bit lengths into one integer

I want to encode two large integers of possibly different maximum bit lengths into a single integer. The first integer is signed (can be negative) whereas the second is unsigned (always non-negative). …

Pytube AttributeError: NoneType object has no attribute span

Hi I have a problem with AttributeError: NoneType object has no attribute span I read on the StackOverflow a channel with this problem on this I found the potential solution but it still not working he…