Extracting specific values for a header in different lines using regex

2024/10/5 19:47:36

I have text string which has multiple lines and each line has mix of characters/numbers and spaces etc.

Here is how a couple lines look like:

WEIGHT                         VOLUME                    CHARGEABLE                PACKAGES\n                                                             
398.000 KG                     4.999 M3                  833.500 KG                12 PLT\n                                                                                         
MAWB                                    HAWB\n    / MH616 /                                                                                         
8947806753                             ABC20018830\n  

Output I am looking for is to extract the above headers as keys and their values as values of a dict.

{ "WEIGHT": 398.00 KG, "VOLUME" : 4.99 M3,"CHAREGABLE" : 833.500 KG,"PACKAGES": 12 PLT,"MAWB"  : 8947806753,"HAWB"  : ABC20018830
} 

I am not sure how to fetch the value for a particular field from a different line under it. If its in same line I can fetch using a pattern. But not sure how to fetch it from a different line (the value of the field is directly underneath it in a different line).

Answer

You can use a regex to easily split the text into a list containing all the fields:

import rea = "WEIGHT                         VOLUME                    CHARGEABLE                PACKAGES\n                                                                         398.000 KG                     4.999 M3                  833.500 KG                12 PLT\n                                                                                         MAWB                                    HAWB\n    / MH616 /                                                                                           8947806753                             ABC20018830\n"# Split on 4 (or more) whitespace (leaves the units with the numbers)
data = re.split(r'\s{4,}', a)
print(data)

['WEIGHT', 'VOLUME', 'CHARGEABLE', 'PACKAGES', '398.000 KG', '4.999 M3', '833.500 KG', '12 PLT', 'MAWB', 'HAWB', '/ MH616 /', '8947806753', 'ABC20018830\n']

Since the keys and values are mixed, there probably isn't an easy way to automatically determine which is which. However if they are always in the same position, you can pick them out manually, e.g.:

b = {# WEIGHTdata[0]: data[4],# VOLUMEdata[1]: data[5]
}
https://en.xdnf.cn/q/119638.html

Related Q&A

Creating a function to process through a .txt file of student grades

Can someone help...My driver file is here:from functions import process_marks def main():try:f = open(argv[1])except FileNotFoundError:print("\nFile ", argv[1], "is not available")e…

Python Reddit PRAW get top week. How to change limit?

I have been familiarising myself with PRAW for reddit. I am trying to get the top x posts for the week, however I am having trouble changing the limit for the "top" method. The documentatio…

I want to convert string 1F to hex 1F in Python, what should I do?

num="1F" nm="1" nm1="2" hex(num)^hex(nm)^hex(nm1)I wrote it like the code above, but hex doesnt work properly. I want to convert the string to hexadecimal, and I want an x…

How to call a function in a Django template?

I have a function on my views.py file that connects to a mail server and then appends to my Django model the email addresses of the recipients. The script works good. In Django, Im displaying the model…

Remove duplicates from json file matching against multiple keys

Original Post = Remove duplicates from json dataThis is only my second post. I didnt have enough points to comment my question on the original post...So here I am.Andy Hayden makes a great point - &quo…

Merging CSVs with similar name python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 8…

urllib.error.HTTPError: HTTP Error 403: Forbidden

I get the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" when scraping certain pages, and understand that adding something like hdr = {"User-Agent: Mozilla/5.0"} to the h…

Compare multiple file name with the prefix of name in same directory

I have multiple .png and .json file in same directory . And I want to check where the files available in the directory are of same name or not like a.png & a.json, b.png & b.json

how to make a unique data from strings

I have a data like this . the strings are separated by comma."India1,India2,myIndia " "Where,Here,Here " "Here,Where,India,uyete" "AFD,TTT"What I am trying…

How to read complex data from TB size binary file, fast and keep the most accuracy?

Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below: file=open(filename,rb) bytes=file.read(8) print(bytes) b\x14\x00\x80?\xb5\x0c\xf81I tried np.fromfile np.fromfile(np…