How to parse code (in Python)?

2024/11/10 15:03:47

I need to parse some special data structures. They are in some somewhat-like-C format that looks roughly like this:

Group("GroupName") {/* C-Style comment */Group("AnotherGroupName") {Entry("some","variables",0,3.141);Entry("other","variables",1,2.718);}Entry("linebreaks","allowed",3,1.414);
}

I can think of several ways to go about this. I could 'tokenize' the code using regular expressions. I could read the code one character at a time and use a state machine to construct my data structure. I could get rid of comma-linebreaks and read the thing line by line. I could write some conversion script that converts this code to executable Python code.

Is there a nice pythonic way to parse files like this?
How would you go about parsing it?

This is more a general question about how to parse strings and not so much about this particular file format.

Answer

Using pyparsing (Mark Tolonen, I was just about to click "Submit Post" when your post came thru), this is pretty straightforward - see comments embedded in the code below:

data = """Group("GroupName") { /* C-Style comment */ Group("AnotherGroupName") { Entry("some","variables",0,3.141); Entry("other","variables",1,2.718); } Entry("linebreaks", "allowed", 3, 1.414 ); 
} """from pyparsing import *# define basic punctuation and data types
LBRACE,RBRACE,LPAREN,RPAREN,SEMI = map(Suppress,"{}();")
GROUP = Keyword("Group")
ENTRY = Keyword("Entry")# use parse actions to do parse-time conversion of values
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t:float(t[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda t:int(t[0]))# parses a string enclosed in quotes, but strips off the quotes at parse time
string = QuotedString('"')# define structure expressions
value = string | real | integer
entry = Group(ENTRY + LPAREN + Group(Optional(delimitedList(value)))) + RPAREN + SEMI# since Groups can contain Groups, need to use a Forward to define recursive expression
group = Forward()
group << Group(GROUP + LPAREN + string("name") + RPAREN + LBRACE + Group(ZeroOrMore(group | entry))("body") + RBRACE)# ignore C style comments wherever they occur
group.ignore(cStyleComment)# parse the sample text
result = group.parseString(data)# print out the tokens as a nice indented list using pprint
from pprint import pprint
pprint(result.asList())

Prints

[['Group','GroupName',[['Group','AnotherGroupName',[['Entry', ['some', 'variables', 0, 3.141]],['Entry', ['other', 'variables', 1, 2.718]]]],['Entry', ['linebreaks', 'allowed', 3, 1.4139999999999999]]]]]

(Unfortunately, there may be some confusion since pyparsing defines a "Group" class, for imparting structure to the parsed tokens - note how the value lists in an Entry get grouped because the list expression is enclosed within a pyparsing Group.)

https://en.xdnf.cn/q/72343.html

Related Q&A

Using OpenCV detectMultiScale to find my face

Im pretty sure I have the general theme correct, but Im not finding any faces. My code reads from c=cv2.VideoCapture(0), i.e. the computers videocamera. I then have the following set up to yield where …

Get marginal effects for sklearn logistic regression

I want to get the marginal effects of a logistic regression from a sklearn modelI know you can get these for a statsmodel logistic regression using .get_margeff(). Is there nothing for sklearn? I want…

How to use win32com.client.constants with MS Word?

Whats wrong with this code? Why win32com.client.constants doesnt have attribute wdWindowStateMinimize?>>> import win32com.client >>> w=win32com.client.Dispatch("Word.Applicatio…

How to properly patch boto3 calls in unit test

Im new to Python unit testing, and I want to mock calls to the boto3 3rd party library. Heres my stripped down code:real_code.py:import boto3def temp_get_variable(var_name):return boto3.client(ssm).ge…

import a github into jupyter notebook directly?

Hey Im creating a jupyter notebook, would like to install: https://github.com/voice32/stock_market_indicators/blob/master/indicators.py which is a python program not sure how to do it directly so anybo…

Django : Call a method only once when the django starts up

I want to initialize some variables (from the database) when Django starts. I am able to get the data from the database but the problem is how should I call the initialize method . And this should be o…

Mocking instance attributes

Please help me understand why the following doesnt work. In particular - instance attributes of a tested class are not visible to Pythons unittest.Mock.In the example below bar instance attribute is no…

Are there any good 3rd party GUI products for Python? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

not able to get root window resize event

im trying to display the size(dimension) of the root window (top level window) on a label. whenever the user resize the window, new window dimensions should be displayed on the label. I tried to bind t…

Inverting large sparse matrices with scipy

I have to invert a large sparse matrix. I cannot escape from the matrix inversion, the only shortcut would be to just get an idea of the main diagonal elements, and ignore the off-diagonal elements (Id…