Semantic analysis - Visitors

This section explains how to transform parse tree to a more usable structure.


You will surely always want to extract some information from the parse tree or to transform it in some more usable form. The process of parse tree transformation to other forms is referred to as semantic analysis. You could do that using parse tree navigation etc., but it is better to use some standard mechanism.

In Arpeggio a visitor pattern is used for semantic analysis. You write a python class that inherits PTNodeVisitor and has a methods of the form visit_<rule name>(self, node, children) where rule name is a rule name from the grammar.

class CalcVisitor(PTNodeVisitor):

    def visit_number(self, node, children):
        return float(node.value)

    def visit_factor(self, node, children):
        if len(children) == 1:
            return children[0]
        sign = -1 if children[0] == '-' else 1
        return sign * children[-1]

    ...

During a semantic analysis a parse tree is walked in the depth-first manner and for each node a proper visitor method is called to transform it to some other form. The results are then fed to the parent node visitor method. This is repeated until the final, top level parse tree node is processed (its visitor is called). The result of the top level node is the final output of the semantic analysis.

To run semantic analysis apply your visitor class to the parse tree using visit_parse_tree function.

result = visit_parse_tree(parse_tree, CalcVisitor(debug=True))

The first parameter is a parse tree you get from the parser.parse call while the second parameter is an instance of your visitor class. Semantic analysis can be run in debug mode if you set debug parameter to True during visitor construction. You can use this flag to print your own debug information from visitor methods.

class MyLanguageVisitor(PTNodeVisitor):

  def visit_somerule(self, node, children):
    if self.debug:
      print("Visiting some rule!")

During semantic analysis, each visitor_xxx method gets current parse tree node as the node parameter and the evaluated children nodes as the children parameter.

For example, if you have expression rule in your grammar then the transformation of the non-terminal matched by this rule can be done as:

def visitor_expression(self, node, children):
  ... # transform node using 'node' and 'children' parameter
  return transformed_node

node is the current NonTerminal or Terminal from the parse tree while the children is an instance of SemanticActionResults class. This class is a list-like structure that holds the results of semantic evaluation from the children parse tree nodes (analysis is done bottom-up).

To suppress node completely return None from visitor method. In this case the parent visitor method will not get this node in its children parameter.

In the calc.py example a semantic analysis (CalcVisitor class) will evaluate the result of arithmetic expression. The parse tree is thus transformed to a single numeric value that represent the result of the expression.

In the robot.py example a semantic analysis (RobotVisitor class) will evaluate robot program (transform its parse tree) to the final robot location.

Semantic analysis can do a complex stuff. For example, see peg_peg.py and PEGVisitor class where the PEG parser for the given language is built using semantic analysis.

SemanticActionResults

SemanticActionResults is the class of object returned from the parse tree nodes evaluation. This class is used for filtering and navigation over evaluation results on children nodes.

Instance of this class is given as children parameter of visitor_xxx methods. This class inherits list so index access as well as iteration is available.

Furthermore, child nodes can be filtered by rule name using attribute access.

def visit_bar(self, node, children):
  # Index access
  child = children[2]

  # Iteration
  for child in children:
    ...

  # Returns a list of all rules created by PEG rule 'baz'
  baz_created = children.baz

Post-processing in second calls

Visitor may define method with the second_<rule_name> name form. If this method exists it will be called after all parse tree node are processed and it will be given the results of the visit_<rule_name> call.

This is usually used when some additional post-processing is needed (e.g. reference resolving).

Default actions

For each parse tree node that does not have an appropriate visit_xxx method a default action is performed. If the node is created by a plain string match, action will return None and thus suppress this node. This is handy for all those syntax noise tokens (brackets, braces, keywords etc.).

For example, if your grammar is:

number_in_brackets = "(" number ")"
number = r'\d+'

then the default action for number will return number node converted to a string and the default action for ( and ) will return None and thus suppress these nodes so the visitor method for number_in_brackets rule will only see one child (from the number rule reference).

If the node is a non-terminal and there is only one child the default action will return that child effectively passing it to the parent node visitor.

Default actions can be disabled by setting parameter defaults to False on visitor construction.

result = visit_parse_tree(parse_tree, CalcVisitor(defaults=False))

If you want to call this default behaviour from your visitor method, call visit__default__(node, children) on superclass (PTNodeVisitor).

def visitor_myrule(self, node, children):
  if some_condition:
    ...
  else:
    return super(MyVisitor, self).visit__default__(node, children)