I have an output
file containing thousands of lines of information. Every so often I find in the output file information of the following form¨
Input Orientation:
...
content
...
Distance matrix (angstroms):
I now want to save the content
to a variable for subsequent formatting. Another thing is that I am only interested in the last pattern in my file. I have a solution for doing this with sed
and awk
, but that leads me to maving multiple files for carrying out one job. This job should be doable with python, but I have no idea where to start reading and to learn this.
EDIT
I have been reading up on regular expressions, and believe it or not I have made some progress! I first read in the file line by line, then reverse the list, and then join all strings that make up that list. I now end up with just one big, multiline string. Next I use the re
module to make my regex r'Distance matrix(.*?)Input orientation'
, which I think means the following: my first pattern is "Distance matrix", then a subpattern where zero or more of all characters are matched, but in a lazy way (stop after first match), and then my last pattern "Input orientation".
with open(inputfile,"r") as input_file:input_file_lines = input_file.readlines()reverse_lines = input_lines[::-1]string = ''.join(reverse_lines)match = re.search('Distance matrix(.*?)Input orientation', string, re.DOTALL).group(1)
Sample data file for testing:
Item Value Threshold Converged?Maximum Force 0.005032 0.000450 NORMS Force 0.001066 0.000300 NOMaximum Displacement 0.027438 0.001800 NORMS Displacement 0.007282 0.001200 NOPredicted change in Energy=-8.909077D-05GradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradInput orientation:---------------------------------------------------------------------Center Atomic Atomic Coordinates (Angstroms)Number Number Type X Y Z---------------------------------------------------------------------1 6 0 Incorrect Incorrect Incorrect2 1 0 Incorrect Incorrect Incorrect3 1 0 Incorrect Incorrect Incorrect4 1 0 Incorrect Incorrect Incorrect5 17 0 Incorrect Incorrect Incorrect6 9 0 Incorrect Incorrect Incorrect---------------------------------------------------------------------Distance matrix (angstroms):1 2 3 4 51 C 0.0000002 H 1.080163 0.0000003 H 1.080326 1.809416 0.0000004 H 1.080621 1.810236 1.810685 0.0000005 Cl 1.962171 2.470702 2.468769 2.465270 0.0000006 F 2.390537 2.343910 2.357275 2.380515 4.35256866 F 0.000000Input orientation:---------------------------------------------------------------------Center Atomic Atomic Coordinates (Angstroms)Number Number Type X Y Z---------------------------------------------------------------------1 6 0 Correct Correct Correct2 1 0 Correct Correct Correct3 1 0 Correct Correct Correct4 1 0 Correct Correct Correct5 17 0 Correct Correct Correct6 9 0 Correct Correct Correct---------------------------------------------------------------------Distance matrix (angstroms):1 2 3 4 51 C 0.0000002 H 1.080516 0.0000003 H 1.080587 1.801890 0.0000004 H 1.080473 1.801427 1.801478 0.0000005 Cl 1.936014 2.458132 2.459437 2.460630 0.0000006 F 2.414588 2.368281 2.365651 2.355690 4.350586