Semantic analysis - Visitors¶
This section explains how to transform parse tree to a more usable structure.
You will surely always want to extract some information from the parse tree or to transform it in some more usable form. The process of parse tree transformation to other forms is referred to as semantic analysis. You could do that using parse tree navigation etc., but it is better to use some standard mechanism.
In Arpeggio a visitor pattern is used for semantic analysis. You write a python
class that inherits
PTNodeVisitor and has a methods of the form
visit_<rule name>(self, node, children) where rule name is a rule name from
class CalcVisitor(PTNodeVisitor): def visit_number(self, node, children): return float(node.value) def visit_factor(self, node, children): if len(children) == 1: return children sign = -1 if children == '-' else 1 return sign * children[-1] ...
During a semantic analysis a parse tree is walked in the depth-first manner and for each node a proper visitor method is called to transform it to some other form. The results are then fed to the parent node visitor method. This is repeated until the final, top level parse tree node is processed (its visitor is called). The result of the top level node is the final output of the semantic analysis.
To run semantic analysis apply your visitor class to the parse tree using
result = visit_parse_tree(parse_tree, CalcVisitor(debug=True))
The first parameter is a parse tree you get from the
parser.parse call while
the second parameter is an instance of your visitor class. Semantic analysis can
be run in debug mode if you set
debug parameter to
True during visitor
construction. You can use this flag to print your own debug information from
class MyLanguageVisitor(PTNodeVisitor): def visit_somerule(self, node, children): if self.debug: print("Visiting some rule!")
During semantic analysis, each
visitor_xxx method gets current parse tree node
node parameter and the evaluated children nodes as the
For example, if you have
expression rule in your grammar then the
transformation of the non-terminal matched by this rule can be done as:
def visitor_expression(self, node, children): ... # transform node using 'node' and 'children' parameter return transformed_node
node is the current
Terminal from the parse tree while the
children is an instance of
SemanticActionResults class. This class is a
list-like structure that holds the results of semantic evaluation from the
children parse tree nodes (analysis is done bottom-up).
To suppress node completely return
None from visitor method. In this case
the parent visitor method will not get this node in its
In the calc.py example a semantic analysis (CalcVisitor class) will evaluate the result of arithmetic expression. The parse tree is thus transformed to a single numeric value that represent the result of the expression.
SemanticActionResults is the class of object returned from the parse tree nodes evaluation.
This class is used for filtering and navigation over evaluation results on children nodes.
Instance of this class is given as
children parameter of
methods. This class inherits
list so index access as well as iteration is
Furthermore, child nodes can be filtered by rule name using attribute access.
def visit_bar(self, node, children): # Index access child = children # Iteration for child in children: ... # Returns a list of all rules created by PEG rule 'baz' baz_created = children.baz
Post-processing in second calls¶
Visitor may define method with the
second_<rule_name> name form. If this
method exists it will be called after all parse tree node are processed and it
will be given the results of the
This is usually used when some additional post-processing is needed (e.g. reference resolving).
For each parse tree node that does not have an appropriate
visit_xxx method a
default action is performed. If the node is created by a plain string match,
action will return
None and thus suppress this node. This is handy for all
those syntax noise tokens (brackets, braces, keywords etc.).
For example, if your grammar is:
number_in_brackets = "(" number ")" number = r'\d+'
then the default action for
number will return number node converted to
a string and the default action for
) will return
None and thus
suppress these nodes so the visitor method for
number_in_brackets rule will
only see one child (from the
number rule reference).
If the node is a non-terminal and there is only one child the default action will return that child effectively passing it to the parent node visitor.
Default actions can be disabled by setting parameter
result = visit_parse_tree(parse_tree, CalcVisitor(defaults=False))
If you want to call this default behaviour from your visitor method, call
visit__default__(node, children) on superclass (
def visitor_myrule(self, node, children): if some_condition: ... else: return super(MyVisitor, self).visit__default__(node, children)