python multiline regex

2024/10/6 22:22:20

I'm having an issue compiling the correct regular expression for a multiline match. Can someone point out what I'm doing wrong. I'm looping through a basic dhcpd.conf file with hundreds of entries such as:

host node20007                                                                                                                  
{                                                                                                                              hardware ethernet 00:22:38:8f:1f:43;                                                                                       fixed-address node20007.domain.com;     
}

I've gotten various regex's to work for the MAC and fixed-address but cannot combine them to match properly.

f = open('/etc/dhcp3/dhcpd.conf', 'r')
re_hostinfo = re.compile(r'(hardware ethernet (.*))\;(?:\n|\r|\r\n?)(.*)',re.MULTILINE)for host in f:
match = re_hostinfo.search(host)if match:print match.groups()

Currently my match groups will look like:
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', '')

But looking for something like:
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')

Answer

Update I've just noticed the real reason that you are getting the results that you got; in your code:

for host in f:match = re_hostinfo.search(host)if match:print match.groups()

host refers to a single line, but your pattern needs to work over two lines.

Try this:

data = f.read()
for x in regex.finditer(data):process(x.groups())

where regex is a compiled pattern that matches over two lines.

If your file is large, and you are sure that the pieces of interest are always spread over two lines, then you could read the file a line at a time, check the line for the first part of the pattern, setting a flag to tell you whether the next line should be checked for the second part. If you are not sure, it's getting complicated, maybe enough to start looking at the pyparsing module.

Now back to the original answer, discussing the pattern that you should use:

You don't need MULTILINE; just match whitespace. Build up your pattern using these building blocks:

(1) fixed text (2) one or more whitespace characters (3) one or more non-whitespace characters

and then put in parentheses to get your groups.

Try this:

>>> m = re.search(r'(hardware ethernet\s+(\S+));\s+\S+\s+(\S+);', data)
>>> print m.groups()
('hardware ethernet   00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
>>>

Please consider using "verbose mode" ... you can use it to document exactly which pieces of pattern match which pieces of data, and it can often help getting the pattern right in the first place. Example:

>>> regex = re.compile(r"""
... (hardware[ ]ethernet \s+
...     (\S+) # MAC
... ) ;
... \s+ # includes newline
... \S+ # variable(??) text e.g. "fixed-address"
... \s+
... (\S+) # e.g. "node20007.domain.com"
... ;
... """, re.VERBOSE)
>>> print regex.search(data).groups()
('hardware ethernet   00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
>>>
https://en.xdnf.cn/q/70311.html

Related Q&A

OpenCV Python Bindings for GrabCut Algorithm

Ive been trying to use the OpenCV implementation of the grab cut method via the Python bindings. I have tried using the version in both cv and cv2 but I am having trouble finding out the correct param…

showing an image with Graphics View widget

Im new to qt designer and python. I want to created a simple project that I should display an image. I used "Graphics View" widget and I named it "graphicsView". I wrote these funct…

TemplateSyntaxError: settings_tags is not a valid tag library

i got this error when i try to run this test case: WHICH IS written in tests.py of my django application:def test_accounts_register( self ):self.url = http://royalflag.com.pk/accounts/register/self.c =…

Setting NLTK with Stanford NLP (both StanfordNERTagger and StanfordPOSTagger) for Spanish

The NLTK documentation is rather poor in this integration. The steps I followed were:Download http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip to /home/me/stanford Download http:…

python variable scope in nested functions

I am reading this article about decorator.At Step 8 , there is a function defined as:def outer():x = 1def inner():print x # 1return innerand if we run it by:>>> foo = outer() >>> foo.…

How can I throttle Python threads?

I have a thread doing a lot of CPU-intensive processing, which seems to be blocking out other threads. How do I limit it?This is for web2py specifically, but a general solution would be fine.

get lastweek dates using python?

I am trying to get the date of the last week with python. if date is : 10 OCT 2014 meansIt should be print10 OCT 2014, 09 OCT 2014, 08 OCT 2014, 07 OCT 2014, 06 OCT 2014, 05 OCT 2014, 04 OCT 2014I trie…

Why is vectorized numpy code slower than for loops?

I have two numpy arrays, X and Y, with shapes (n,d) and (m,d), respectively. Assume that we want to compute the Euclidean distances between each row of X and each row of Y and store the result in array…

Handle TCP Provider: Error code 0x68 (104)

Im using this code to sync my db with the clients:import pyodbcSYNC_FETCH_ARRAY_SIZE=25000# define connection + cursorconnection = pyodbc.connect()cursor = connection.cursor()query = select some_column…

vectorized radix sort with numpy - can it beat np.sort?

Numpy doesnt yet have a radix sort, so I wondered whether it was possible to write one using pre-existing numpy functions. So far I have the following, which does work, but is about 10 times slower tha…