Parse a custom text file in Python

2024/10/12 20:20:49

I have a text to be parsed, this is a concise form of the text.

apple {type=fruitvarieties {color=redorigin=usa}
}

the output should be as shown below

apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa

So far the only thing I have come up with is a sort of breadth-first approach in python. But I cant figure out how to get all the children within.

progInput = """apple {type=fruitvarieties {color=redorigin=usa}
}
"""
progInputSplitToLines = progInput.split('\n')
childrenList = []
root = ""def hasChildren():if "{" in progInputSplitToLines[0]:global rootroot = progInputSplitToLines[0].split(" ")[0]for e in progInputSplitToLines[1:]:if "=" in e:childrenList.append({e.split("=")[0].replace("    ", ""),e.split("=")[1].replace("    ", "")})
hasChildren()

PS: I looked into tree structures in Python and came across anytree (https://anytree.readthedocs.io/en/latest/), do you think it would help in my case?

Would you please be able to help me out ? I'm not very good at parsing text. thanks a bunch in advance. :)

Answer

Since your file is in HOCON format, you can try using the pyhocon HOCON parser module to solve your problem.

Install: Either run pip install pyhocon, or download the github repo and perform a manual install with python setup.py install.

Basic usage:

from pyhocon import ConfigFactoryconf = ConfigFactory.parse_file('text.conf')print(conf)

Which gives the following nested structure:

ConfigTree([('apple', ConfigTree([('type', 'fruit'), ('varieties', ConfigTree([('color', 'red'), ('origin', 'usa')]))]))])

ConfigTree is just a collections.OrderedDict(), as seen in the source code.

UPDATE:

To get your desired output, you can make your own recursive function to collect all paths:

from pyhocon import ConfigFactory
from pyhocon.config_tree import ConfigTreedef config_paths(config):for k, v in config.items():if isinstance(v, ConfigTree):for k1, v1 in config_paths(v):yield (k,) + k1, v1else:yield (k,), vconfig = ConfigFactory.parse_file('text.conf')
for k, v in config_paths(config):print('%s=%s' % ('.'.join(k), v))

Which Outputs:

apple.type=fruit
apple.varieties.color=red
apple.varieties.origin=usa
https://en.xdnf.cn/q/118161.html

Related Q&A

Logs Dont Overwrite

Im using Pythons logging.config module to configure and use a logging tool in my project.I want my log files to overwrite each time (not append), so I set my YAML configuration file like this:# logging…

How to upload local files to Firebase storage from Jupyter Notebook using Python

Since I guess importing google.cloud.storage might be a very first step to set API connecting the firebase storage, what I did first is to install google-cloud on Ubuntu like this:$ pip install --upgra…

How can scrapy crawl more urls?

as we see:def parse(self, response):hxs = HtmlXPathSelector(response)sites = hxs.select(//ul/li)items = []for site in sites:item = Website()item[name] = site.select(a/text()).extract()item[url] = site.…

Pyplot - shift position of y-axis ticks and its data

Using pyplot, how do I modify my plot to change the vertical position of my yticks? E.g. in my plot above, I want to move Promoter down and CDS up (along with their lines in the plot).For the above pl…

How to exit a Python program or loop via keybind or macro? Keyboardinterrupt not working

I am trying to complete a simple GUI automation program that merely opens a web page and then clicks on a specific spot on the page every 0.2 seconds until I tell it to stop. I want my code to run and …

SKlearn prediction on test dataset with different shape from training dataset shape

Im new to ML and would be grateful for any assistance provided. Ive run a linear regression prediction using test set A and training set A. I saved the linear regression model and would now like to use…

How to eliminate suspicious barcode (like 123456) data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 6…

how to get href link from onclick function in python

I want to get href link of website form onclick function Here is html code in which onclick function call a website <div class="fl"><span class="taLnk" onclick="ta.tr…

Python tkinters entry.get() does not work, how can I fix it? [duplicate]

This question already has answers here:Why is Tkinter Entrys get function returning nothing?(6 answers)Closed 7 years ago.I am building a simple program for university. We have to convert our code to …

Pandas secondary y axis for boxplots

Id like to use a secondary y-axis for some boxplots in pandas, but it doesnt seem available. import numpy as np import pandas as pddata = np.random.random((10, 5)) data[:,-1] += 10 # offset one column…