How to extract multiple grandchildren/children from XML where one child is a specific value?

2024/9/22 10:37:25

I'm working with an XML file that stores all "versions" of the chatbot we create. Currently we have 18 versions and I only care about the most recent one. I'm trying to find a way to extract all botDialogGroup elements along with their associated label element for this "v18". There's a one-to-many relationship between 'botDialogGroup' and 'label'.

Here's a snippet of XML where I have the botDialogGroup called "Transfer" and the label called "Transfer with a question". Not that this is only one version of the Bot, there are a total of 18.

Link to a sample XML file. https://pastebin.com/aaDfBPUm

Also to note, fullName is the child of botVersions. Whereas botDialogGroup and label for the grandchild of botVersions, and their parent is botDialogs.

<Bot><botVersions><fullName>v18</fullName><botDialogs><botDialogGroup>Transfer</botDialogGroup><botSteps><botVariableOperation><askCollectIfSet>false</askCollectIfSet><botMessages><message>Would you like to chat with an agent?</message></botMessages><botQuickReplyOptions><literalValue>Yes</literalValue></botQuickReplyOptions><botQuickReplyOptions><literalValue>No</literalValue></botQuickReplyOptions><botVariableOperands><disableAutoFill>true</disableAutoFill><sourceName>YesOrNoChoices</sourceName><sourceType>MlSlotClass</sourceType><targetName>Transfer_To_Agent</targetName><targetType>ConversationVariable</targetType></botVariableOperands><optionalCollect>false</optionalCollect><quickReplyType>Static</quickReplyType><quickReplyWidgetType>Buttons</quickReplyWidgetType><retryMessages><message>I&apos;m sorry, I didn&apos;t understand that. You have to select an option to proceed.</message></retryMessages><type>Collect</type></botVariableOperation><type>VariableOperation</type></botSteps><botSteps><botStepConditions><leftOperandName>Transfer_To_Agent</leftOperandName><leftOperandType>ConversationVariable</leftOperandType><operatorType>Equals</operatorType><rightOperandValue>No</rightOperandValue></botStepConditions><botSteps><botVariableOperation><botVariableOperands><targetName>Transfer_To_Agent</targetName><targetType>ConversationVariable</targetType></botVariableOperands><type>Unset</type></botVariableOperation><type>VariableOperation</type></botSteps><botSteps><botNavigation><botNavigationLinks><targetBotDialog>Main_Menu</targetBotDialog></botNavigationLinks><type>Redirect</type></botNavigation><type>Navigation</type></botSteps><type>Group</type></botSteps><botSteps><botStepConditions><leftOperandName>Transfer_To_Agent</leftOperandName><leftOperandType>ConversationVariable</leftOperandType><operatorType>Equals</operatorType><rightOperandValue>Yes</rightOperandValue></botStepConditions><botStepConditions><leftOperandName>Online_Product</leftOperandName><leftOperandType>ConversationVariable</leftOperandType><operatorType>NotEquals</operatorType><rightOperandValue>OTP</rightOperandValue></botStepConditions><botStepConditions><leftOperandName>Online_Product</leftOperandName><leftOperandType>ConversationVariable</leftOperandType><operatorType>NotEquals</operatorType><rightOperandValue>TCF</rightOperandValue></botStepConditions><botSteps><botVariableOperation><botVariableOperands><targetName>Transfer_To_Agent</targetName><targetType>ConversationVariable</targetType></botVariableOperands><type>Unset</type></botVariableOperation><type>VariableOperation</type></botSteps><botSteps><botNavigation><botNavigationLinks><targetBotDialog>Find_Business_Hours</targetBotDialog></botNavigationLinks><type>Call</type></botNavigation><type>Navigation</type></botSteps><type>Group</type></botSteps><botSteps><botNavigation><botNavigationLinks><targetBotDialog>Direct_Transfer</targetBotDialog></botNavigationLinks><type>Redirect</type></botNavigation><type>Navigation</type></botSteps><developerName>Transfer_To_Agent</developerName><label>Transfer with a question</label><mlIntent>Transfer_To_Agent</mlIntent><mlIntentTrainingEnabled>true</mlIntentTrainingEnabled><showInFooterMenu>false</showInFooterMenu></botDialogs>
</botVersions>
</Bot>

Current script

The problem I have is that it will search the entire tree, all 18 versions, for the botDialogGroup and label elements because i'm using findall(). Whereas I only want it to search the botVersions of the most recent fullName, in this case is "v18".

Manually entering "v18" isn't a problem since I always know the version to look for. And it's useful since different bots have different versions.

import xml.etree.ElementTree as ET
import pandas as pdcols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []tree = ET.parse('ChattyBot.xml')
root = tree.getroot()for fullName in root.findall(".//fullName[.='v18']"):for botDialogGroup in root.findall(".//botDialogGroup"):for label in root.findall(".//label"):print(fullName.text, botDialogGroup.text, label.text)rows.append({"BotVersion": fullName.text,"DialogGroup": botDialogGroup.text,"Dialog": label.text})df = pd.DataFrame(rows, columns=cols)df.to_csv("botcsvfile.csv")

Desired end result saved to a csv file using pandas.

BotVersion DialogGroup Dialog
v18 Transfer Transfer with a question
Answer

Ok this code makes the assumption that your XML is going to be of the pattern of version, dialog1, dialog2, dialog3, version2, dialog1, dialog2, etc... if this is not the case then let me know and I will reevaluate the code. But basically loop over the code and creating groups of dialogs too versions then sort by version number. After that flatten to get a nested list form to create the pandas dataframe.

import xml.etree.ElementTree as ET
import pandas as pdcols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []tree = ET.parse('test.xml')
root = tree.getroot()for fullName in root.findall(".//botVersions"):versions = list(fullName)# creating the many to one relation between the versions and bot dialogs
grouping = []
relations = []
for i, tag in enumerate(versions):if i == 0:relations.append(tag)elif tag.tag == 'fullName':grouping.append(relations)relations = []relations.append(tag)else:relations.append(tag)# edge case for end of list)if i == len(versions) - 1:grouping.append(relations)#sorting by the text of the fullName tag to be able to slice the end for latest version
grouping.sort(key=lambda x: x[0].text)
rows = grouping[-1]#flatening the text into rows for the pandas dataframe
version_number = rows[0].text
pandas_row = [version_number]
pandas_rows = []
for r in rows[1:]:pandas_row = [version_number]for child in r.iter():if child.tag in ['botDialogGroup', 'label']:pandas_row.append(child.text)pandas_rows.append(pandas_row)df = pd.DataFrame(pandas_rows, columns=cols)
print(df)
https://en.xdnf.cn/q/119143.html

Related Q&A

Is there are a way to replace python import with actual sources?

I am having python files with the import statements which I would like to replace into the actual code placed in the foo.py.For instance, with the in file:from foo import Barbar = Bar() print barI woul…

How to use supported numpy and math functions with CUDA in Python?

According to numba 0.51.2 documentation, CUDA Python supports several math functions. However, it doesnt work in the following kernel function: @cuda.jit def find_angle(angles):i, j = cuda.grid(2)if i …

PYTHON - Remove tuple from list if contained in another list

I have a list of tuples:list_of_tuples = [(4, 35.26), (1, 48.19), (5, 90.0), (3, 90.0)]tuple[0] is an item_IDtuple[1] is an angleI have a list of item_IDs I want to remove/ignore from the list:ignore_I…

How to run a ij loop in Python, and not repeat (j,i) if (i,j) has already been done?

I am trying to implement an "i not equal to j" (i<j) loop, which skips cases where i = j, but I would further like to make the additional requirement that the loop does not repeat the perm…

Split a string with multiple delimiters

I have the string "open this and close that" and I want to obtain "open this and" and "close that". This is my best attempt:>>>print( re.split(r[ ](?=(open|clos…

Extracting a string between 2 chracters using python [duplicate]

This question already has answers here:Python-get string between to characters(4 answers)Closed 7 years ago.I need a Python regex to give me all the strings between ~ and ^ from a string like this:~~~~…

remove empty line printed from hive query output using python

i am performing a hive query and storing the output in a tsv file in the local FS. I am running a for loop for the hive query and passing different parameters. If the hive query returns no output once …

.exceptions.WebDriverException: Message: Can not connect to the Service

struggling to find a solution all over, have latest chrome 117 and also downloaded chromedriver and used the path accordingly in script also tried with chrome browser Although it opens the browser but …

How to call a previous function in a new function? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.This question does not appear to be about programming within the scope defined in the help center.Cl…

Using simpleauth to login in with GAE

This question is in the reference of this. As suggested I am using simpleauth to login via linkedin. Now I am having trouble with the redirect_uri. I have successfully deployed dev_appserver.py example…