How to extract multiple grandchildren/children from XML where one child is a specific value?

2024/9/22 10:37:25

I'm working with an XML file that stores all "versions" of the chatbot we create. Currently we have 18 versions and I only care about the most recent one. I'm trying to find a way to extract all botDialogGroup elements along with their associated label element for this "v18". There's a one-to-many relationship between 'botDialogGroup' and 'label'.

Here's a snippet of XML where I have the botDialogGroup called "Transfer" and the label called "Transfer with a question". Not that this is only one version of the Bot, there are a total of 18.

Link to a sample XML file.

Also to note, fullName is the child of botVersions. Whereas botDialogGroup and label for the grandchild of botVersions, and their parent is botDialogs.

<Bot><botVersions><fullName>v18</fullName><botDialogs><botDialogGroup>Transfer</botDialogGroup><botSteps><botVariableOperation><askCollectIfSet>false</askCollectIfSet><botMessages><message>Would you like to chat with an agent?</message></botMessages><botQuickReplyOptions><literalValue>Yes</literalValue></botQuickReplyOptions><botQuickReplyOptions><literalValue>No</literalValue></botQuickReplyOptions><botVariableOperands><disableAutoFill>true</disableAutoFill><sourceName>YesOrNoChoices</sourceName><sourceType>MlSlotClass</sourceType><targetName>Transfer_To_Agent</targetName><targetType>ConversationVariable</targetType></botVariableOperands><optionalCollect>false</optionalCollect><quickReplyType>Static</quickReplyType><quickReplyWidgetType>Buttons</quickReplyWidgetType><retryMessages><message>I&apos;m sorry, I didn&apos;t understand that. You have to select an option to proceed.</message></retryMessages><type>Collect</type></botVariableOperation><type>VariableOperation</type></botSteps><botSteps><botStepConditions><leftOperandName>Transfer_To_Agent</leftOperandName><leftOperandType>ConversationVariable</leftOperandType><operatorType>Equals</operatorType><rightOperandValue>No</rightOperandValue></botStepConditions><botSteps><botVariableOperation><botVariableOperands><targetName>Transfer_To_Agent</targetName><targetType>ConversationVariable</targetType></botVariableOperands><type>Unset</type></botVariableOperation><type>VariableOperation</type></botSteps><botSteps><botNavigation><botNavigationLinks><targetBotDialog>Main_Menu</targetBotDialog></botNavigationLinks><type>Redirect</type></botNavigation><type>Navigation</type></botSteps><type>Group</type></botSteps><botSteps><botStepConditions><leftOperandName>Transfer_To_Agent</leftOperandName><leftOperandType>ConversationVariable</leftOperandType><operatorType>Equals</operatorType><rightOperandValue>Yes</rightOperandValue></botStepConditions><botStepConditions><leftOperandName>Online_Product</leftOperandName><leftOperandType>ConversationVariable</leftOperandType><operatorType>NotEquals</operatorType><rightOperandValue>OTP</rightOperandValue></botStepConditions><botStepConditions><leftOperandName>Online_Product</leftOperandName><leftOperandType>ConversationVariable</leftOperandType><operatorType>NotEquals</operatorType><rightOperandValue>TCF</rightOperandValue></botStepConditions><botSteps><botVariableOperation><botVariableOperands><targetName>Transfer_To_Agent</targetName><targetType>ConversationVariable</targetType></botVariableOperands><type>Unset</type></botVariableOperation><type>VariableOperation</type></botSteps><botSteps><botNavigation><botNavigationLinks><targetBotDialog>Find_Business_Hours</targetBotDialog></botNavigationLinks><type>Call</type></botNavigation><type>Navigation</type></botSteps><type>Group</type></botSteps><botSteps><botNavigation><botNavigationLinks><targetBotDialog>Direct_Transfer</targetBotDialog></botNavigationLinks><type>Redirect</type></botNavigation><type>Navigation</type></botSteps><developerName>Transfer_To_Agent</developerName><label>Transfer with a question</label><mlIntent>Transfer_To_Agent</mlIntent><mlIntentTrainingEnabled>true</mlIntentTrainingEnabled><showInFooterMenu>false</showInFooterMenu></botDialogs>

Current script

The problem I have is that it will search the entire tree, all 18 versions, for the botDialogGroup and label elements because i'm using findall(). Whereas I only want it to search the botVersions of the most recent fullName, in this case is "v18".

Manually entering "v18" isn't a problem since I always know the version to look for. And it's useful since different bots have different versions.

import xml.etree.ElementTree as ET
import pandas as pdcols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []tree = ET.parse('ChattyBot.xml')
root = tree.getroot()for fullName in root.findall(".//fullName[.='v18']"):for botDialogGroup in root.findall(".//botDialogGroup"):for label in root.findall(".//label"):print(fullName.text, botDialogGroup.text, label.text)rows.append({"BotVersion": fullName.text,"DialogGroup": botDialogGroup.text,"Dialog": label.text})df = pd.DataFrame(rows, columns=cols)df.to_csv("botcsvfile.csv")

Desired end result saved to a csv file using pandas.

BotVersion DialogGroup Dialog
v18 Transfer Transfer with a question

Ok this code makes the assumption that your XML is going to be of the pattern of version, dialog1, dialog2, dialog3, version2, dialog1, dialog2, etc... if this is not the case then let me know and I will reevaluate the code. But basically loop over the code and creating groups of dialogs too versions then sort by version number. After that flatten to get a nested list form to create the pandas dataframe.

import xml.etree.ElementTree as ET
import pandas as pdcols = ["BotVersion", "DialogGroup", "Dialog"]
rows = []tree = ET.parse('test.xml')
root = tree.getroot()for fullName in root.findall(".//botVersions"):versions = list(fullName)# creating the many to one relation between the versions and bot dialogs
grouping = []
relations = []
for i, tag in enumerate(versions):if i == 0:relations.append(tag)elif tag.tag == 'fullName':grouping.append(relations)relations = []relations.append(tag)else:relations.append(tag)# edge case for end of list)if i == len(versions) - 1:grouping.append(relations)#sorting by the text of the fullName tag to be able to slice the end for latest version
grouping.sort(key=lambda x: x[0].text)
rows = grouping[-1]#flatening the text into rows for the pandas dataframe
version_number = rows[0].text
pandas_row = [version_number]
pandas_rows = []
for r in rows[1:]:pandas_row = [version_number]for child in r.iter():if child.tag in ['botDialogGroup', 'label']:pandas_row.append(child.text)pandas_rows.append(pandas_row)df = pd.DataFrame(pandas_rows, columns=cols)

