Question 1

I need to parse some special data structures. They are in some somewhat-like-C format that looks roughly like this:

Group("GroupName") {/* C-Style comment */Group("AnotherGroupName") {Entry("some","variables",0,3.141);Entry("other","variables",1,2.718);}Entry("linebreaks","allowed",3,1.414);
}

I can think of several ways to go about this. I could 'tokenize' the code using regular expressions. I could read the code one character at a time and use a state machine to construct my data structure. I could get rid of comma-linebreaks and read the thing line by line. I could write some conversion script that converts this code to executable Python code.

Is there a nice pythonic way to parse files like this?
How would you go about parsing it?

This is more a general question about how to parse strings and not so much about this particular file format.

Question 2

Using pyparsing (Mark Tolonen, I was just about to click "Submit Post" when your post came thru), this is pretty straightforward - see comments embedded in the code below:

data = """Group("GroupName") { /* C-Style comment */ Group("AnotherGroupName") { Entry("some","variables",0,3.141); Entry("other","variables",1,2.718); } Entry("linebreaks", "allowed", 3, 1.414 ); 
} """from pyparsing import *# define basic punctuation and data types
LBRACE,RBRACE,LPAREN,RPAREN,SEMI = map(Suppress,"{}();")
GROUP = Keyword("Group")
ENTRY = Keyword("Entry")# use parse actions to do parse-time conversion of values
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t:float(t[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda t:int(t[0]))# parses a string enclosed in quotes, but strips off the quotes at parse time
string = QuotedString('"')# define structure expressions
value = string | real | integer
entry = Group(ENTRY + LPAREN + Group(Optional(delimitedList(value)))) + RPAREN + SEMI# since Groups can contain Groups, need to use a Forward to define recursive expression
group = Forward()
group << Group(GROUP + LPAREN + string("name") + RPAREN + LBRACE + Group(ZeroOrMore(group | entry))("body") + RBRACE)# ignore C style comments wherever they occur
group.ignore(cStyleComment)# parse the sample text
result = group.parseString(data)# print out the tokens as a nice indented list using pprint
from pprint import pprint
pprint(result.asList())

Prints

[['Group','GroupName',[['Group','AnotherGroupName',[['Entry', ['some', 'variables', 0, 3.141]],['Entry', ['other', 'variables', 1, 2.718]]]],['Entry', ['linebreaks', 'allowed', 3, 1.4139999999999999]]]]]

(Unfortunately, there may be some confusion since pyparsing defines a "Group" class, for imparting structure to the parsed tokens - note how the value lists in an Entry get grouped because the list expression is enclosed within a pyparsing Group.)

How to parse code (in Python)?

Related Q&A

Using OpenCV detectMultiScale to find my face

Get marginal effects for sklearn logistic regression

How to use win32com.client.constants with MS Word?

How to properly patch boto3 calls in unit test

import a github into jupyter notebook directly?

Django : Call a method only once when the django starts up

Mocking instance attributes

Are there any good 3rd party GUI products for Python? [closed]

not able to get root window resize event

Inverting large sparse matrices with scipy