Multiline python regex

2024/10/1 3:20:40

I have a file structured like this :

A: some text
B: more text
even more text
on several lines
A: and we start again
B: more text
more
multiline text

I'm trying to find the regex that will split my file like this :

>>>re.findall(regex,f.read())
[('some text','more text','even more text\non several lines'),('and we start again','more text', 'more\nmultiline text')]

So far, I've ended up with the following :

>>>re.findall('A:(.*?)\nB:(.*?)\n(.*?)',f.read(),re.DOTALL)
[(' some text', ' more text', ''), (' and we start again', ' more text', '')]

The multiline text is not catched. I guess is because the lazy qualifier is really lazy and catch nothing, but I take it out, the regex gets really greedy :

>>>re.findall('A:(.*?)\nB:(.*?)\n(.*)',f.read(),re.DOTALL)
[(' some text',
' more text',
'even more text\non several lines\nA: and we start again\nB: more text\nmore\nmultiline text')]

Does any one has an idea ? Thanks !

Answer

You could tell the regex to stop matching at the next line that starts with A: (or at the end of the string):

re.findall(r'A:(.*?)\nB:(.*?)\n(.*?)(?=^A:|\Z)', f.read(), re.DOTALL|re.MULTILINE)
https://en.xdnf.cn/q/71006.html

Related Q&A

How can I process xml asynchronously in python?

I have a large XML data file (>160M) to process, and it seems like SAX/expat/pulldom parsing is the way to go. Id like to have a thread that sifts through the nodes and pushes nodes to be processed …

python postgresql: reliably check for updates in a specific table

Situation: I have a live trading script which computes all sorts of stuff every x minutes in my main thread (Python). the order sending is performed through such thread. the reception and execution of …

How to push to remote repo with GitPython

I have to clone a set of projects from one repository and push it then to a remote repository automatically. Therefore im using python and the specific module GitPython. Until now i can clone the proje…

How do I do use non-integer string labels with SVM from scikit-learn? Python

Scikit-learn has fairly user-friendly python modules for machine learning.I am trying to train an SVM tagger for Natural Language Processing (NLP) where my labels and input data are words and annotatio…

Python - walk through a huge set of files but in a more efficient manner

I have huge set of files that I want to traverse through using python. I am using os.walk(source) for the same and is working but since I have a huge set of files it is taking too much and memory resou…

Python: handling a large set of data. Scipy or Rpy? And how?

In my python environment, the Rpy and Scipy packages are already installed. The problem I want to tackle is such:1) A huge set of financial data are stored in a text file. Loading into Excel is not pos…

Jupyter notebook - cant import python functions from other folders

I have a Jupyter notebook, I want to use local python functions from other folders in my computer. When I do import to these functions I get this error: "ModuleNotFoundError: No module named xxxxx…

Can pandas plot a time-series without trying to convert the index to Periods?

When plotting a time-series, I observe an unusual behavior, which eventually results in not being able to format the xticks of the plot. It seems that pandas internally tries to convert the index into …

pip install syntax for allowing insecure

I tried to run$pip install --upgrade --allow-insecure setuptoolsbut it doesnt seem to work? is my syntax wrong?this is on ubuntu 13.10 I need --allow-insecure as I havent been able to the get the co…

how do I determine the locations of the points after perspective transform, in the new image plane?

Im using OpenCV+Python+Numpy and I have three points in the image, I know the exact locations of those points.(P1, P2);N1I am going to transform the image to another view, (for example I am transformin…