Match a pattern and save to variable using python

2024/11/18 23:27:59

I have an output file containing thousands of lines of information. Every so often I find in the output file information of the following form¨

Input Orientation:
...
content
...
Distance matrix (angstroms):

I now want to save the content to a variable for subsequent formatting. Another thing is that I am only interested in the last pattern in my file. I have a solution for doing this with sed and awk, but that leads me to maving multiple files for carrying out one job. This job should be doable with python, but I have no idea where to start reading and to learn this.


EDIT I have been reading up on regular expressions, and believe it or not I have made some progress! I first read in the file line by line, then reverse the list, and then join all strings that make up that list. I now end up with just one big, multiline string. Next I use the re module to make my regex r'Distance matrix(.*?)Input orientation', which I think means the following: my first pattern is "Distance matrix", then a subpattern where zero or more of all characters are matched, but in a lazy way (stop after first match), and then my last pattern "Input orientation".

with open(inputfile,"r") as input_file:input_file_lines = input_file.readlines()reverse_lines = input_lines[::-1]string = ''.join(reverse_lines)match = re.search('Distance matrix(.*?)Input orientation', string, re.DOTALL).group(1)

Sample data file for testing:

Item               Value     Threshold  Converged?Maximum Force            0.005032     0.000450     NORMS     Force            0.001066     0.000300     NOMaximum Displacement     0.027438     0.001800     NORMS     Displacement     0.007282     0.001200     NOPredicted change in Energy=-8.909077D-05GradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradInput orientation:---------------------------------------------------------------------Center     Atomic      Atomic             Coordinates (Angstroms)Number     Number       Type             X           Y           Z---------------------------------------------------------------------1          6           0        Incorrect    Incorrect    Incorrect2          1           0        Incorrect    Incorrect    Incorrect3          1           0        Incorrect    Incorrect    Incorrect4          1           0        Incorrect    Incorrect    Incorrect5         17           0        Incorrect    Incorrect    Incorrect6          9           0        Incorrect    Incorrect    Incorrect---------------------------------------------------------------------Distance matrix (angstroms):1          2          3          4          51  C    0.0000002  H    1.080163   0.0000003  H    1.080326   1.809416   0.0000004  H    1.080621   1.810236   1.810685   0.0000005  Cl   1.962171   2.470702   2.468769   2.465270   0.0000006  F    2.390537   2.343910   2.357275   2.380515   4.35256866  F    0.000000Input orientation:---------------------------------------------------------------------Center     Atomic      Atomic             Coordinates (Angstroms)Number     Number       Type             X           Y           Z---------------------------------------------------------------------1          6           0        Correct    Correct     Correct2          1           0        Correct    Correct     Correct3          1           0        Correct    Correct     Correct4          1           0        Correct    Correct     Correct5         17           0        Correct    Correct     Correct6          9           0        Correct    Correct     Correct---------------------------------------------------------------------Distance matrix (angstroms):1          2          3          4          51  C    0.0000002  H    1.080516   0.0000003  H    1.080587   1.801890   0.0000004  H    1.080473   1.801427   1.801478   0.0000005  Cl   1.936014   2.458132   2.459437   2.460630   0.0000006  F    2.414588   2.368281   2.365651   2.355690   4.350586
Answer

Regex isn't necessary here. All you need is good ol' indexing. Python strings have index and rindex methods that take a substring, finds it in the text, and returns the index of the first character in the substring. Reading this doc should get you familiar with slicing strings. The program could look something like this:

with open(input_file) as f:s = f.read()  # reads the file as one big stringlast_block = s[s.rindex('Input'):s.rindex('Distance')]

The last line of that code finds the first occurrence of 'Input' starting from the end of the file, since we used rindex, and moving towards the front and marks that position as an integer. It then does the same with 'Distance'. It then uses those integers to return only the portion of the string that rests between them. in the case of your example file it would return:

                                      Input orientation:---------------------------------------------------------------------Center     Atomic      Atomic             Coordinates (Angstroms)Number     Number       Type             X           Y           Z---------------------------------------------------------------------1          6           0        Correct    Correct     Correct2          1           0        Correct    Correct     Correct3          1           0        Correct    Correct     Correct4          1           0        Correct    Correct     Correct5         17           0        Correct    Correct     Correct6          9           0        Correct    Correct     Correct---------------------------------------------------------------------

If you don't want the 'Input orientation' header, you can simply add to the result of rindex('Input') until you get the desired result. That could look like s[s.rindex('Input') + 19:s.rindex('Distance')], for instance.

It is also important to note that index and rindex throw errors if the substring is not found. If that is not desired, you can use find and rfind.

https://en.xdnf.cn/q/118615.html

Related Q&A

Sharing a Queue instance between different modules

I am new to Python and I would like to create what is a global static variable, my thread-safe and process-safe queue, between threads/processes created in different modules. I read from the doc that t…

Square a number with functions in python [duplicate]

This question already has answers here:What does it mean when the parentheses are omitted from a function or method call?(6 answers)Closed last year.This is an extremely easy question for Python. Its…

Changing the cell name

I have a file that contains the following:NameABCD0145ABCD1445ABCD0998And Im trying to write a cod that read every row and change the name to the following format:NameABCD_145ABCD_1445ABCD_998keeping i…

Procfile Heroku

I tried to deploy my first Telegram chatbot (done with Chatterbot library) on Heroku. The files of my chatbot are: requirements (txt file) Procfile (worker: python magghybot.py) botusers (csv file) Mag…

How do i loop a code until a certain number is created?

This task is to determine the difference between two attributes, strength and skill, from game characters. The process for this is:Determining the difference between the strength attributes. The differ…

Finding the longest list in given list that contains only positive numbers in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed l…

How to create multiple VideoCapture Objects

I wanted to create multiple VideoCapture Objects for stitching video from multiple cameras to a single video mashup.for example: I have path for three videos that I wanted to be read using Video Captur…

How to read Data from Url in python using Pandas?

I am trying to read the text data from the Url mentioned in the code. But it throws an error:ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2url="https://cdn.upgrad.…

Testing multiple string in conditions in list comprehension [duplicate]

This question already has answers here:How to test multiple variables for equality against a single value?(31 answers)Closed 6 years ago.I am trying to add multiple or clauses to a python if statement…

Filter range from two dates in the same query Django/Python

I need the result from a query that filters two dates from the same model. I need to get in the result 5 days (today plus 4 days) from original date and sale from target date (today plus 4 more days) b…