I've to parse a text file that contains different kind of data. The most challenging is a line that contains three different JSON object (strings) and other data between them. I've to divide the Json data from the rest. The good thing is this: every Json object start with a name. The issue I'm having with regex is isolate the first Json string obj from the others, and parse it using json. Here my solution (it works) but i bet there is something better... I'm not good in regex yet.
#This parse a string for isolate the first JSON serialized Object.
def get_json_first_end(text):ind_ret = 0ind1 = 0for i,v in enumerate(text):if v == '{':ind1 = ind1 + 1if v == '}':ind1 = ind1 - 1if ind1 == 0:ind_ret = ibreakreturn ind_ret#This return a string that contain the JSON object
def get_json_str(line,json_name):js_str = ''if re.match('(.*)' + json_name + '(.*)',line):#Removing all spurious data before and after the Json objdata = re.sub('(.*)'+ json_name,'',line)ind1 = data.find('{')ind2 = data.rfind('}')ind3 = get_json_first_end(data[ind1:ind2+1])js_str = data[ind1:ind3+2]return js_str
If i don't call get_json_first_end
the ind2 can be wrong if there are multiple json strings in the same line.
The get_json_str
return a string with the JS string obj I want and I can parse it with json without issues. My question is: there is a better way to do this? get_json_first_end
seems quite ugly.
Thanks
Update: here an example line:
ConfigJSON ["CFG","VAR","1","[unused bit 2]","[unused bit 3]","[unused bit 4]","[unused bit 5]"] 2062195231AppTitle "Fsdn" 3737063363Bits ["RESET","QUICK","KILL","[unused bit 2]","[unused bit 3]","[unused bit 4]","[unused bit 5]"] 0837383711CRC 33628 0665393097ForceBits {"Auxiliary":[{"index":18,"name":"AUX1.INPUT"},{"index":19,"name":"AUX2.INPUT"}],"Network":[{"index":72,"name":"INPUT.1"}],"Physical":[]}