Parser configuration¶
This section describes how to alter parser default behaviour.
There are some aspect of parsing that can be configured using parser and/or
ParsingExpression
parameters. Arpeggio has some sane default behaviour but
gives the user possibility to alter it.
This section describes various parser parameters.
Case insensitive parsing¶
By default Arpeggio is case sensitive. If you wish to do case insensitive
parsing set parser parameter ignore_case
to True
.
parser = ParserPython(calc, ignore_case=True)
White-space handling¶
Arpeggio by default skips white-spaces. You can change this behaviour with the
parameter skipws
given to parser constructor.
parser = ParserPython(calc, skipws=False)
You can also change what is considered a whitespace by Arpeggio using the ws
parameter. It is a plain string that consists of white-space characters. By
default it is set to "\t\n\r "
.
For example, to prevent a newline to be treated as whitespace you could write:
parser = ParserPython(calc, ws='\t\r ')
Note
These parameters can be used on the Sequence
level so one could write
grammar like this:
def grammar(): return Sequence("one", "two", "three", skipws=False),
"four"
parser = ParserPython(grammar)
pt = parser.parse("onetwothree four")
Keyword handling¶
By setting a autokwd
parameter to True
a word boundary match for
keyword-like matches will be performed.
This parameter is disabled by default.
def grammar(): return "one", "two", "three"
parser = ParserPython(grammar, autokwd=True)
# If autokwd is enabled this should parse without error.
parser.parse("one two three")
# But this will not parse as the match is done using word boundaries
# so this is considered a one word.
parser.parse("onetwothree")
Comment handling¶
Support for comments in your language can be specified as another set of grammar rules. See simple.py example.
Parser is constructed using two parameters.
parser = ParserPython(simpleLanguage, comment)
First parameter is the root rule of main parse model while the second is a rule for comments.
During parsing comment parse trees are kept in the separate list thus comments will not show in the main parse tree.
Parse tree reduction¶
Non-terminals are by default created for each rule. Sometimes it can result in
trees of great depth. You can alter this behaviour setting reduce_tree
parameter to True
.
parser = ParserPython(calc, reduce_tree=True)
In this configuration non-terminals a with single child will be removed from the parse tree.
For example, calc
parse tree above will look like this:
Notice the removal of each non-terminal with a single child.
Warning
Be aware that semantic analysis operates on nodes of finished parse tree. Therefore, if you use tree reduction, visitor methods will not get called for the removed nodes.
Newline termination for Repetitions¶
By default Repetition
parsing expressions (i.e. ZeroOrMore
and OneOrMore
)
will obey skipws
and ws
settings but there are situations where repetitions
should not pass the end of the current line. For this feature eolterm
parameter is introduced which can be set on a repetition and will ensure that it
terminates before entering a new line.
def grammar(): return first, second
def first(): return ZeroOrMore(["a", "b"], eolterm=True)
def second(): return "a"
# first rule should match only first line
# so that second rule will match "a" on the new line
input = """a a b a b b
a"""
parser = ParserPython(grammar)
result = parser.parse(input)
Separator for Repetitions¶
It is possible to specify parsing expression that will be used in between each two matches in repetitions.
For example:
def grammar(): return ZeroOrMore(["a", "b"], sep=",")
# Commas will be treated as separators between elements
input = "a , b, b, a"
parser = ParserPython(grammar)
result = parser.parse(input)
sep
can be any valid parsing expression.
Memoization (a.k.a. packrat parsing)¶
This technique is based on memoizing result on each parsing expression rule. For some grammars with a lot of backtracking this can yield a significant speed increase at the expense of some memory used for the memoization cache.
Starting with Arpeggio 1.5 this feature is disabled by default. If you think
that parsing is slow, try to enable memoization by setting memoization
parameter to True
during parser instantiation.
parser = ParserPython(grammar, memoization=True)