Troubleshooting Guide
Common problems and mistakes
Left recursion and RecursionError
If you get RecursionError: maximum recursion depth exceeded while calling a
Python object
it is a good indication that you have a left
recursion in the grammar.
Note
Arpeggio parser will implement a support for detecting and reporting of left recursions in the grammar. See issue 23
A left recursion is found if the parser calls the same rule again while no characters from the input is consumed from the previous call (e.g. we have the same state). This will lead to the same sequence of events and we have infinite loop.
For example, lets suppose that we want to match following string:
b a a a a a a
We could write a grammar like this:
A = A 'a' / 'b'
But this grammar is left-recursive and the recursive-descent top-down parser
like Arpeggio will try to loop indefinitely trying to match A
over and over
again in the same spot of the input string.
Although, there are techniques to handle left-recursion in top-down parsers automatically, Arpeggio does not implements them and a classic approach of removing left recursion must be used.
To remove left recursion from the above grammar we do the following:
A = 'b' 'a'*
Or, get all non-left recursive choices and put them first (b
in this case) and
than add the zero-or-more repetition of the recursive part without the left
recursive non-terminal (a
from A 'a'
in this case).
Another example:
add = mult / add '+' mult / add '-' mult
becomes:
add = mult (('+' mult) / ('-' mult))*
or:
add = mult (('+' / '-') mult)*
In general:
A = A a1 / A a2 / ... / A an / b1 / b2 / ... / bm
where uppercase letters represents non-terminals whereas lowercase letters represent terminals.
Removing left recursion yields:
A = (b1 / b2 / ... / bm) (a1 / a2 / ... / an)*
Danger
Be aware that the parse tree will not be the same.
Unrecognized grammar element '...'
This might happen when non-unicode literals are used. Make sure that you use unicode literals when defining grammars using Python notation.
You might want to include:
from __future__ import unicode_literals
This will enable unicode literals in the python < 3.
Visitor method is not called during semantic analysis
Semantic analysis operates on a parse tree nodes produced by grammar rules.
If you are using a reduce_tree=True
option in the construction of the parser
all non-terminal nodes with only one child will be suppressed in the parse tree.
Thus, visitor methods for those nodes will not be called.
To resolve issue either disable tree reduction during parser construction (i.e.
reduce_tree=False
) or do visitor job in some of the calling rules that produce
parse tree node with more than one child.
As a side note, there is implicit reduction of nodes whose grammar rule is a sequence with only one child.
def mean(): return number
def number(): return _(r'\d*\.\d*|\d+')
Here a node number
will be suppressed from the parser model and visitor
visit_number
will not be called. You have to define visit_mean
or a visitor
for some of the rules calling mean
.
This implicit reduction can not be disabled at the moment. Please see issue 24.