I have an unclear xml and process it with python lxml module. I want replace all \n
in content with space
before any processing, how can I do this work for text of all elements.
edit
my xml example:
<root><a> dsdfs\n dsf\n sdf\n</a><bds> <d>sdf\n\n\n\n\n\n</d><d>sdf\n\n\nsdf\nsdf\n\n</d></bds>................
</root>
and i wan't to get this in output when i print ittertext:
root = #get root element
for i in root.ittertext():print idsdfs dsf sdf
dsdfs dsf sdf
sdf nsdf sdf
Below code will parse the xml into a string, then replace \n
with space
and then write to a new xml file. You can do other processing in between, depending what exactly you want to do.
from lxml import etree
tree = etree.parse('some.xml')
root = tree.getroot()
# Get the whole XML content as string
xml_in_str = etree.tostring(root)# Replace all \n with space
new_xml_data = xml_in_str.replace(r'\n', ' ')# Do the processing with the new_xml_data string which is formatted# Maybe also write to a new XML file, without the \n
with open('newxml.xml', 'w') as f:f.write(new_xml_data)
some.xml
looks like:
<root><a> dsdfs\n dsf\n sdf\n</a><bds> <d>sdf\n\n\n\n\n\n</d><d>sdf\n\n\nsdf\nsdf\n\n</d></bds><bds> <d>sdf\n\n\n\n\n\n</d><d>sdf\n\n\nsdf\nsdf\n\n</d></bds><bds> <d>sdf\n\n\n\n\n\n</d><d>sdf\n\n\nsdf\nsdf\n\n</d></bds>
</root>
newxml.xml
looks like:
<root><a> dsdfs dsf sdf </a><bds> <d>sdf </d><d>sdf sdf sdf </d></bds><bds> <d>sdf </d><d>sdf sdf sdf </d></bds><bds> <d>sdf </d><d>sdf sdf sdf </d></bds>
</root>