Replace `\n` in html page with space in python LXML

2024/10/11 8:30:52

I have an unclear xml and process it with python lxml module. I want replace all \n in content with space before any processing, how can I do this work for text of all elements.

edit my xml example:

<root><a> dsdfs\n dsf\n sdf\n</a><bds> <d>sdf\n\n\n\n\n\n</d><d>sdf\n\n\nsdf\nsdf\n\n</d></bds>................
</root>

and i wan't to get this in output when i print ittertext:

root = #get root element
for i in root.ittertext():print idsdfs  dsf  sdf
dsdfs  dsf  sdf
sdf  nsdf sdf  
Answer

Below code will parse the xml into a string, then replace \n with space and then write to a new xml file. You can do other processing in between, depending what exactly you want to do.

from lxml import etree 
tree = etree.parse('some.xml') 
root = tree.getroot()
# Get the whole XML content as  string
xml_in_str = etree.tostring(root)# Replace all \n with space
new_xml_data = xml_in_str.replace(r'\n', ' ')# Do the processing with the new_xml_data string which is formatted# Maybe also write to a new XML file, without the \n
with open('newxml.xml', 'w') as f:f.write(new_xml_data)

some.xml looks like:

<root><a> dsdfs\n dsf\n sdf\n</a><bds> <d>sdf\n\n\n\n\n\n</d><d>sdf\n\n\nsdf\nsdf\n\n</d></bds><bds> <d>sdf\n\n\n\n\n\n</d><d>sdf\n\n\nsdf\nsdf\n\n</d></bds><bds> <d>sdf\n\n\n\n\n\n</d><d>sdf\n\n\nsdf\nsdf\n\n</d></bds>
</root>

newxml.xml looks like:

<root><a> dsdfs  dsf  sdf </a><bds> <d>sdf      </d><d>sdf   sdf sdf  </d></bds><bds> <d>sdf      </d><d>sdf   sdf sdf  </d></bds><bds> <d>sdf      </d><d>sdf   sdf sdf  </d></bds>
</root>
https://en.xdnf.cn/q/118343.html

Related Q&A

Basic python. Quick question regarding calling a function [duplicate]

This question already has answers here:How do I get ("return") a result (output) from a function? How can I use the result later?(4 answers)Closed 1 year ago.Ive got a basic problem in pyth…

Obtain the duration of a mp4 file [duplicate]

This question already has answers here:How to get the duration of a video in Python?(15 answers)Closed 10 years ago.I need to know the duration of a mp4 file with python 3.3. I search and try to do th…

Matplotlib.pyplot - Deactivate axes in figure. /Axis of figure overlap with axes of subplot

%load_ext autoreload %autoreload 2 %matplotlib inlineimport numpy as np import datetime as dt import pickle import pandas as pd import datetime from datetime import timedelta, date from datetime impor…

How to generate the captcha to train with Python

I would like to use deep learning program for recognizing the captcha using keras with python.But the big challenge is to generate massive captcha to train. I want to solve a captcha like thisHow can …

Convert PNG to a binary (base 2) string in Python

I Basically want to read a png file and convert it into binary(base 2) and store the converted base 2 value in a string. Ive tried so many things, but all of them are showing some error

Turbodbc installation on Windows 10

I tried installing turbodbc and it gives me the following error and not sure whats wrong here.My python version is 3.7My command line output from Windows 10 Pro. C:\Users\marunachalam\Downloads>pip …

how to convert a text into tuples in a list in python

I am a beginner in python and desperately need someones help.I am trying to convert a text into tuples in a list. The original text was already tokenized and each pos was tagged as below:The/DT Fulton/…

Trying to call a function within class but its not working [duplicate]

This question already has answers here:TypeError: attack() missing 1 required positional argument: self(2 answers)Closed 3 years ago.I am trying to call a function but its not working. here is the code…

Tkinter Label not showing Int variable

Im trying to make a simple RPG where as you collect gold it will be showed in a label, but it doesnt!! Here is the code:def start():Inv=Tk()gold = IntVar(value=78)EtkI2=Label(Inv, textvariable=gold).pa…

How to refer a certain place in a line in Python

I have this little piece of code:g = open("spheretop1.stl", "r") m = open("morelinestop1.gcode", "w") searchlines = g.readlines() file = "" for i, line…