I have a text to be parsed, this is a concise form of the text.
apple {type=fruitvarieties {color=redorigin=usa}
}
the output should be as shown below
apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa
So far the only thing I have come up with is a sort of breadth-first approach in python. But I cant figure out how to get all the children within.
progInput = """apple {type=fruitvarieties {color=redorigin=usa}
}
"""
progInputSplitToLines = progInput.split('\n')
childrenList = []
root = ""def hasChildren():if "{" in progInputSplitToLines[0]:global rootroot = progInputSplitToLines[0].split(" ")[0]for e in progInputSplitToLines[1:]:if "=" in e:childrenList.append({e.split("=")[0].replace(" ", ""),e.split("=")[1].replace(" ", "")})
hasChildren()
PS: I looked into tree structures in Python and came across anytree (https://anytree.readthedocs.io/en/latest/), do you think it would help in my case?
Would you please be able to help me out ? I'm not very good at parsing text. thanks a bunch in advance. :)
Since your file is in HOCON format, you can try using the pyhocon
HOCON parser module to solve your problem.
Install: Either run pip install pyhocon
, or download the github repo and perform a manual install with python setup.py install
.
Basic usage:
from pyhocon import ConfigFactoryconf = ConfigFactory.parse_file('text.conf')print(conf)
Which gives the following nested structure:
ConfigTree([('apple', ConfigTree([('type', 'fruit'), ('varieties', ConfigTree([('color', 'red'), ('origin', 'usa')]))]))])
ConfigTree
is just a collections.OrderedDict()
, as seen in the source code.
UPDATE:
To get your desired output, you can make your own recursive function to collect all paths:
from pyhocon import ConfigFactory
from pyhocon.config_tree import ConfigTreedef config_paths(config):for k, v in config.items():if isinstance(v, ConfigTree):for k1, v1 in config_paths(v):yield (k,) + k1, v1else:yield (k,), vconfig = ConfigFactory.parse_file('text.conf')
for k, v in config_paths(config):print('%s=%s' % ('.'.join(k), v))
Which Outputs:
apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa