Python RegEx remove new lines (that shouldnt be there)

2024/11/14 12:45:17

I got some text extracted and wish to clean it up by RegEx.

I have learned basic RegEx but not sure how to build this one:

str = '''
this is 
a line that has been cut.
This is a line that should start on a new line
'''

should be converted to this:

str = '''
this is a line that has been cut.
This is a line that should start on a new line
'''

This r'\w\n\w' seems to catch it, but not sure how to replace the new line with space and not touch the end and beginning of words

Answer

You can use this lookbehind regex for re.sub:

>>> str = '''
... this is
... a line that has been cut.
... This is a line that should start on a new line
... '''
>>> print re.sub(r'(?<!\.)\n', '', str)
this is a line that has been cut.
This is a line that should start on a new line
>>>

RegEx Demo

(?<!\.)\n matches all line breaks that are not preceded by a dot.

If you don't want a match based on presence of dot then use:

re.sub(r'(?<=\w\s)\n', '', str)

RegEx Demo 2

https://en.xdnf.cn/q/119753.html

Related Q&A

Python CSV writer

I have a csv that looks like this:HA-MASTER,CategoryID 38231-S04-A00,14 39790-S10-A03,14 38231-S04-A00,15 39790-S10-A03,15 38231-S04-A00,16 39790-S10-A03,16 38231-S04-A00,17 39790-S10-A03,17 38231-S04-…

How to perform standardization on the data in GridSearchCV?

How to perform standardizing on the data in GridSearchCV?Here is the code. I have no idea on how to do it.import dataset import warnings warnings.filterwarnings("ignore")import pandas as pd …

how to find the permutations of string? python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.Questions asking for code must demonstrate a minimal understanding of the problem being solved. Incl…

Unicode category for commas and quotation marks

I have this helper function that gets rid of control characters in XML text:def remove_control_characters(s): #Remove control characters in XML textt = ""for ch in s:if unicodedata.category(c…

Uppercase every other word in a string using split/join

I have a string: string = "Hello World" That needs changing to: "hello WORLD" Using only split and join in Python. Any help? string = "Hello World" split_str = string.spl…

BeautifulSoup get text from tag searching by Title

Im scrapping a webpage with python that provides different documents and I want to retrieve some information from them. The document gives the information in two ways, theres this one where it gives it…

Subtract from first value in numpy array [duplicate]

This question already has answers here:Numpy modify array in place?(4 answers)Closed 6 years ago.Having numpy array like that:a = np.array([35,2,160,56,120,80,1,1,0,0,1])I want to subtract custom valu…

how to give range of a worksheet as variable

I am having one excel sheet which is used to read the data through python openpyxl...so in my script i have values that are hard coded as ws[E2:AB3] as AB3 is the last entry to be read...but now the sh…

how to remove brackets from these individual elements? [duplicate]

This question already has answers here:How do I make a flat list out of a list of lists?(32 answers)Closed 2 years ago.This post was edited and submitted for review 2 years ago and failed to reopen th…

First project alarm clock

from tkinter import * from tkinter import ttk from time import strftime import winsoundclock = Tk()clock.title("WhatAClock")clock.geometry("300x400")notebook = ttk.Notebook()tab1_t…