Duplicating an XML element and adding it to a specific position in XML file using python

2024/11/15 3:44:25

I have a xml file in which content looks like this:

xml_content_to_search =

<Document ProviderID="TD" DecimalMarker="comma" Website="https://erc-viewer.sap.com/">
<available_substances><substance ID="0004" DD="14" MM="10" YYYY="2010"><SubName>0004</SubName><url>./UN/0004.xml</url><group>ADR0004_0101</group><group>THP0004Y0101</group><group>THC0004Y0101</group><group>TRP0004Y0101</group><group>TRC0004Y0101</group><group>TIP0004Y0101</group><group>TIC0004Y0101</group><group>CTR0004Y0102</group><group>CRP0004Y0102</group><group>CRC0004Y0102</group></substance><substance ID="ADR0004_0101" DD="26" MM="10" YYYY="2022"><SubName>asa</SubName><url>ADR/ADR0004_0101.xml</url></substance><substance ID="THP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd)</SubName><url>THP/THP0004Y0101.xml</url></substance><substance ID="THC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>THC/THC0004Y0101.xml</url></substance><substance ID="TRP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRP/TRP0004Y0101.xml</url></substance><substance ID="TRC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRC/TRC0004Y0101.xml</url></substance></available_substances></Document>

I want to search for a specific substance id in xml file and then duplicate it and do some manipulation and I am able to implement it. But after duplicating I want to insert that duplicated element right below the substance id from which it was duplicated.

This is my code:

# Use the os.listdir() method to list all files in the specified folder and filter for XML files
for filename in os.listdir(IAC_files_path):if filename.endswith(".xml"):# Remove the ".xml" extension before adding to the listxml_file_names.append(os.path.splitext(filename)[0])# Parse the XML content to search for <substance> elements with matching IDs
tree = ET.ElementTree(ET.fromstring(xml_content_to_search))
root = tree.getroot()# Initialize a flag to check if at least one match is found
match_found = False# Create a list to store duplicated <substance> elements
duplicated_substance_elements = []# Iterate through the <substance> elements and search for matching IDs
for substance_element in root.findall(".//substance"):substance_id = substance_element.get("ID")print(f"Processing substance_id: {substance_id}")# Check if the ID without the extension is in the listbase_substance_id = os.path.splitext(substance_id)[0]if base_substance_id in xml_file_names:# Print the XML file name found in the <substance> element's ID attributeprint(f"Found XML file name '{substance_id}' in the other XML file.")match_found = True# Create a new <substance> element with modified attributes for IUCduplicate_substance_element_iuc = ET.Element("substance")duplicate_substance_element_iuc.set("ID", base_substance_id.replace("IAC", "IUC"))duplicate_substance_element_iuc.set("DD", substance_element.get("DD"))duplicate_substance_element_iuc.set("MM", substance_element.get("MM"))duplicate_substance_element_iuc.set("YYYY", substance_element.get("YYYY"))# Duplicate and modify the <SubName> element for IUCsubname_element = substance_element.find("SubName")duplicate_subname_element_iuc = ET.Element("SubName")duplicate_subname_element_iuc.text = subname_element.text.replace("IAC", "IUC")duplicate_substance_element_iuc.append(duplicate_subname_element_iuc)# Duplicate and modify the <url> element for IUCurl_element = substance_element.find("url")duplicate_url_element_iuc = ET.Element("url")duplicate_url_element_iuc.text = url_element.text.replace("IAC", "IUC")duplicate_substance_element_iuc.append(duplicate_url_element_iuc)# Insert the duplicated IUC <substance> element immediately after the original IAC elementsubstance_element_index = list(root).index(substance_element)root.insert(substance_element_index + 1, duplicate_substance_element_iuc)# Create a new <substance> element with modified attributes for IECduplicate_substance_element_iec = ET.Element("substance")duplicate_substance_element_iec.set("ID", base_substance_id.replace("IAC", "IEC"))duplicate_substance_element_iec.set("DD", substance_element.get("DD"))duplicate_substance_element_iec.set("MM", substance_element.get("MM"))duplicate_substance_element_iec.set("YYYY", substance_element.get("YYYY"))# Duplicate and modify the <SubName> element for IECduplicate_subname_element_iec = ET.Element("SubName")duplicate_subname_element_iec.text = subname_element.text.replace("IAC", "IEC")duplicate_substance_element_iec.append(duplicate_subname_element_iec)# Duplicate and modify the <url> element for IECduplicate_url_element_iec = ET.Element("url")duplicate_url_element_iec.text = url_element.text.replace("IAC", "IEC")duplicate_substance_element_iec.append(duplicate_url_element_iec)# Insert the duplicated IUC <substance> element immediately after the original IAC elementsubstance_element_index = list(root).index(substance_element)root.insert(substance_element_index + 2, duplicate_substance_element_iec)# Append the duplicated IEC <substance> element to the list#duplicated_substance_elements.append(duplicate_substance_element_iec)# Check if no matches were found and print "Not found" message
if not match_found:print("No XML file names were found in the other XML file.")# # Append the duplicated IEC <substance> elements to the end
# for duplicate_element in duplicated_substance_elements:
#     root.append(duplicate_element)# Print the modified XML content
modified_xml_content = ET.tostring(root, encoding="unicode")
print(modified_xml_content)

I am getting this error :

<Element 'substance' at 0x000002BF2DFE8720> is not in list

at this line of code

substance_element_index = list(root).index(substance_element)

My desired output is something like this:

<Document ProviderID="TD" DecimalMarker="comma" Website="https://erc-viewer.sap.com/">
<available_substances><substance ID="0004" DD="14" MM="10" YYYY="2010"><SubName>0004</SubName><url>./UN/0004.xml</url><group>ADR0004_0101</group><group>THP0004Y0101</group><group>THC0004Y0101</group><group>TRP0004Y0101</group><group>TRC0004Y0101</group><group>TIP0004Y0101</group><group>TIC0004Y0101</group><group>CTR0004Y0102</group><group>CRP0004Y0102</group><group>CRC0004Y0102</group></substance><substance ID="ADR0004_0101" DD="26" MM="10" YYYY="2022"><SubName>asa</SubName><url>ADR/ADR0004_0101.xml</url></substance><substance ID="THP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd)</SubName><url>THP/THP0004Y0101.xml</url></substance><substance ID="THC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>THC/THC0004Y0101.xml</url></substance><substance ID="TRP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRP/TRP0004Y0101.xml</url></substance><substance ID="TRC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRC/TRC0004Y0101.xml</url></substance>**<substance ID="IEC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRC/TRC0004Y0101.xml</url></substance>**</available_substances></Document>

I have a xml file in which content looks like this:

xml_content_to_search =

<Document ProviderID="TD" DecimalMarker="comma" Website="https://erc-viewer.sap.com/">
<available_substances><substance ID="0004" DD="14" MM="10" YYYY="2010"><SubName>0004</SubName><url>./UN/0004.xml</url><group>ADR0004_0101</group><group>THP0004Y0101</group><group>THC0004Y0101</group><group>TRP0004Y0101</group><group>TRC0004Y0101</group><group>TIP0004Y0101</group><group>TIC0004Y0101</group><group>CTR0004Y0102</group><group>CRP0004Y0102</group><group>CRC0004Y0102</group></substance><substance ID="ADR0004_0101" DD="26" MM="10" YYYY="2022"><SubName>asa</SubName><url>ADR/ADR0004_0101.xml</url></substance><substance ID="THP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd)</SubName><url>THP/THP0004Y0101.xml</url></substance><substance ID="THC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>THC/THC0004Y0101.xml</url></substance><substance ID="TRP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRP/TRP0004Y0101.xml</url></substance><substance ID="TRC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRC/TRC0004Y0101.xml</url></substance></available_substances></Document>

I want to search for a specific substance id in xml file and then duplicate it and do some manipulation and I am able to implement it. But after duplicating I want to insert that duplicated element right below the substance id from which it was duplicated.

This is my code:

# Use the os.listdir() method to list all files in the specified folder and filter for XML files
for filename in os.listdir(IAC_files_path):if filename.endswith(".xml"):# Remove the ".xml" extension before adding to the listxml_file_names.append(os.path.splitext(filename)[0])# Parse the XML content to search for <substance> elements with matching IDs
tree = ET.ElementTree(ET.fromstring(xml_content_to_search))
root = tree.getroot()# Initialize a flag to check if at least one match is found
match_found = False# Create a list to store duplicated <substance> elements
duplicated_substance_elements = []# Iterate through the <substance> elements and search for matching IDs
for substance_element in root.findall(".//substance"):substance_id = substance_element.get("ID")print(f"Processing substance_id: {substance_id}")# Check if the ID without the extension is in the listbase_substance_id = os.path.splitext(substance_id)[0]if base_substance_id in xml_file_names:# Print the XML file name found in the <substance> element's ID attributeprint(f"Found XML file name '{substance_id}' in the other XML file.")match_found = True# Create a new <substance> element with modified attributes for IUCduplicate_substance_element_iuc = ET.Element("substance")duplicate_substance_element_iuc.set("ID", base_substance_id.replace("IAC", "IUC"))duplicate_substance_element_iuc.set("DD", substance_element.get("DD"))duplicate_substance_element_iuc.set("MM", substance_element.get("MM"))duplicate_substance_element_iuc.set("YYYY", substance_element.get("YYYY"))# Duplicate and modify the <SubName> element for IUCsubname_element = substance_element.find("SubName")duplicate_subname_element_iuc = ET.Element("SubName")duplicate_subname_element_iuc.text = subname_element.text.replace("IAC", "IUC")duplicate_substance_element_iuc.append(duplicate_subname_element_iuc)# Duplicate and modify the <url> element for IUCurl_element = substance_element.find("url")duplicate_url_element_iuc = ET.Element("url")duplicate_url_element_iuc.text = url_element.text.replace("IAC", "IUC")duplicate_substance_element_iuc.append(duplicate_url_element_iuc)# Insert the duplicated IUC <substance> element immediately after the original IAC elementsubstance_element_index = list(root).index(substance_element)root.insert(substance_element_index + 1, duplicate_substance_element_iuc)# Create a new <substance> element with modified attributes for IECduplicate_substance_element_iec = ET.Element("substance")duplicate_substance_element_iec.set("ID", base_substance_id.replace("IAC", "IEC"))duplicate_substance_element_iec.set("DD", substance_element.get("DD"))duplicate_substance_element_iec.set("MM", substance_element.get("MM"))duplicate_substance_element_iec.set("YYYY", substance_element.get("YYYY"))# Duplicate and modify the <SubName> element for IECduplicate_subname_element_iec = ET.Element("SubName")duplicate_subname_element_iec.text = subname_element.text.replace("IAC", "IEC")duplicate_substance_element_iec.append(duplicate_subname_element_iec)# Duplicate and modify the <url> element for IECduplicate_url_element_iec = ET.Element("url")duplicate_url_element_iec.text = url_element.text.replace("IAC", "IEC")duplicate_substance_element_iec.append(duplicate_url_element_iec)# Insert the duplicated IUC <substance> element immediately after the original IAC elementsubstance_element_index = list(root).index(substance_element)root.insert(substance_element_index + 2, duplicate_substance_element_iec)# Append the duplicated IEC <substance> element to the list#duplicated_substance_elements.append(duplicate_substance_element_iec)# Check if no matches were found and print "Not found" message
if not match_found:print("No XML file names were found in the other XML file.")# # Append the duplicated IEC <substance> elements to the end
# for duplicate_element in duplicated_substance_elements:
#     root.append(duplicate_element)# Print the modified XML content
modified_xml_content = ET.tostring(root, encoding="unicode")
print(modified_xml_content)

I am getting this error :

<Element 'substance' at 0x000002BF2DFE8720> is not in list

at this line of code

substance_element_index = list(root).index(substance_element)

My desired output is something like this:

<Document ProviderID="TD" DecimalMarker="comma" Website="https://erc-viewer.sap.com/">
<available_substances><substance ID="0004" DD="14" MM="10" YYYY="2010"><SubName>0004</SubName><url>./UN/0004.xml</url><group>ADR0004_0101</group><group>THP0004Y0101</group><group>THC0004Y0101</group><group>TRP0004Y0101</group><group>TRC0004Y0101</group><group>TIP0004Y0101</group><group>TIC0004Y0101</group><group>CTR0004Y0102</group><group>CRP0004Y0102</group><group>CRC0004Y0102</group></substance><substance ID="ADR0004_0101" DD="26" MM="10" YYYY="2022"><SubName>asa</SubName><url>ADR/ADR0004_0101.xml</url></substance><substance ID="THP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd)</SubName><url>THP/THP0004Y0101.xml</url></substance><substance ID="THC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>THC/THC0004Y0101.xml</url></substance><substance ID="TRP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRP/TRP0004Y0101.xml</url></substance><substance ID="TRC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRC/TRC0004Y0101.xml</url></substance>**<substance ID="IEC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRC/TRC0004Y0101.xml</url></substance>**</available_substances></Document>
Answer

You can copy a element, change the content and insert it:

import xml.etree.ElementTree as ET
from copy import deepcopytree = ET.parse('substance.xml')
root = tree.getroot()sub = root.findall('.//substance')
print(len(sub))co = deepcopy(sub[3])
for elem in co.iter():if elem.tag == 'substance':elem.set('ID', 'THC0004Y0101_insert')elem.set('DD', '27')elem.set('MM', '11')elem.set('YYYY', '1998')if elem.tag == 'SubName':elem.text = 'iso'if elem.tag == 'url':elem.text = 'ISO/ADR0004_010x.xml'root.find('.//available_substances').insert(4, co)     ET.dump(root)

Output:


<Document ProviderID="TD" DecimalMarker="comma" Website="https://erc-viewer.sap.com/">
<available_substances><substance ID="0004" DD="14" MM="10" YYYY="2010"><SubName>0004</SubName><url>./UN/0004.xml</url><group>ADR0004_0101</group><group>THP0004Y0101</group><group>THC0004Y0101</group><group>TRP0004Y0101</group><group>TRC0004Y0101</group><group>TIP0004Y0101</group><group>TIC0004Y0101</group><group>CTR0004Y0102</group><group>CRP0004Y0102</group><group>CRC0004Y0102</group></substance><substance ID="ADR0004_0101" DD="26" MM="10" YYYY="2022"><SubName>asa</SubName><url>ADR/ADR0004_0101.xml</url></substance><substance ID="THP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd)</SubName><url>THP/THP0004Y0101.xml</url></substance><substance ID="THC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>THC/THC0004Y0101.xml</url></substance><substance ID="THC0004Y0101_insert" DD="27" MM="11" YYYY="1998"><SubName>iso</SubName><url>ISO/ADR0004_010x.xml</url></substance><substance ID="TRP0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRP/TRP0004Y0101.xml</url></substance><substance ID="TRC0004Y0101" DD="26" MM="10" YYYY="2020"><SubName>asd</SubName><url>TRC/TRC0004Y0101.xml</url></substance></available_substances></Document>
https://en.xdnf.cn/q/119084.html

Related Q&A

How do I fix this Gets server error, which is causing display issues?

The list in the left column of ontariocourts311.ca, along with the body of the page under the image intermittently fail to display (which is fixed by refreshing the page). Im a Noob, and have tried var…

Installing Scipy for Windows

I am trying to install Scipy on my computer. I did it by using the command pip install Scipy. (pip & numpy are up-to-date and I am using Python 3.6) I also tried it with Pycharm, but it didnt worke…

Python Opencv, dont put circle on the video

I wrote the following script with OpenCVimport cv2 import numpy as npcap = cv2.VideoCapture(0) ix, iy = -1, -1def draw_circle(event, x, y, flags, param):global ixglobal iyix,iy = x,yif event == cv2.EVE…

List coordinates between a set of coordinates

This should be fairly easy, but Im getting a headache from trying to figure it out. I want to list all the coordinates between two points. Like so:1: (1,1) 2: (1,3) In between: (1,2)Or1: (1,1) 2: (5,1)…

NA values in column is not NaN Pandas Python [duplicate]

This question already has answers here:Prevent pandas from interpreting NA as NaN in a string(7 answers)Closed 2 years ago.I got a CSV File. I got a column Product. One of the products in it, called NA…

How to fix pandas column data

Workflow is :Read CSV file using Pythons pandas library and get Variation Column Variation Column data isVariation ---------- Color Family : Black, Size:Int:L Color Family : Blue, Size:Int:M Color Fam…

Connect to Oracle Database and export data as CSV using Python

I want to connect oracle database to python and using select statement whatever result i will get, I want that result to be exported as csv file in sftp location. I know we can connect oracle with pyth…

Pandas data frame: convert Int column into binary in python

I have dataframe eg. like below Event[EVENT_ID] = [ 4162, 4161, 4160, 4159,4158, 4157, 4156, 4155, 4154]need to convert each row word to binary. Event[b]=bin(Event[EVENT_ID]) doesnt work TypeError: can…

I have an issue : Reading Multiple Text files using Multi-Threading by python

Hello Friends, I hope someone check my code and helping me on this issue. I want to read from multiple text files (at least 4) sequentially and print their content on the screenFirst time not using Thr…

How to print \ in python?

print "\\"It print me in console...But I want to get string \How to get string string \?