Comparing textX to other tools

There are generally two classes of textual DSL tools textX could be compared to.

The first class comprises tools that use traditional parsing technologies, i.e. given a grammar (usually a context-free grammar) they produce program code capable of recognizing whether a given textual input conforms to the given grammar. Furthermore, they enable either transformation of textual input to a tree structure (i.e. parse tree), that is processed afterwards, or definition of actions that should be executed during parsing if a particular pattern is recognized. Most popular representatives in this class are lex and yacc, ANTLR, GNU bison, These kind of tools are generally known by the name Parser Generators.

textX's differences in regard to this first class are following:

  • textX works as grammar interpreter i.e. parser code is not generated by the tool but the tools is configured by the grammar to recognize textual input on the language specified by the grammar. You can even embed your grammar as a Python string. This enables faster round-trip from grammar to the working parsers as the parser don't need to be regenerated but only reconfigured.
  • Most of the classical parsing tools use context-free grammars while textX uses PEG grammars. The consequences are that lookahead is unlimited and there are no ambiguities possible as the alternative operator is ordered. Additionally, there is no need for a separate lexer.
  • textX uses a single textual specification (grammar) to define not only the syntax of the language but also its meta-model (a.k.a. abstract syntax). The textX's meta-language is inspired by Xtext. This is very important feature which enables automatic construction of the model (a.k.a. abstract semantic graph - ASG or semantic model) without further work from the language designer. In traditional parsing tools transformation to the model usually involves coding of parse actions or manually written parse tree transformation.

The second class of textual DSL tools are more powerful tools geared especially towards DSL construction. These kind of tools are generally known by the name Language Workbenches coined by Martin Fowler. Most popular representatives of this class are Xtext, Spoofax and MPS. These tools are much more complex, highly integrated to the particular development environment (IDE) but provide powerful tooling infrastructure for language development, debugging and evolving. These tools will build not only parser but also a language-specific editor, debugger, validator, visualiser etc.

textX is positioned between these two classes of DSL tools. The goal of textX project is not a highly sophisticated DSL engineering platform but a simple DSL Python library that can be used in various Python applications and development environment. It can also be used for non-Python development using code generation from textX models (see Entity tutorial). Tooling infrastructure, editor support etc. will be developed as independent projects (see for example textx-tools).

Difference to Xtext grammar language

textX grammar language is inspired by Xtext and thus there are a lot of similarities between these tools when it come to grammar specification. But, there are also differences in several places. In this section we shall outline those differences to give users already familiar with Xtext a brief overview.

Lexer and terminal rules

textX uses PEG parsing which doesn't needs separate lexing phase. This eliminate the need to define lexemes in the grammar. Therefore, there is no terminal keyword in the textX nor any of special terminal definition rules used by Xtext.

Types used for rules

Xtext integrates tightly with Java and Ecore typing system providing keyword returns in rule definition by which language designer might define a class used to instantiate objects recognized by the parser.

textX integrates with Python typing system. In textX there is no keyword returns. The class used for the rule will be dynamically created Python class for all non-match rules. Language designer can provide class using user classes registration on meta-model. If the rule is of [match type] than it will always return Python string or some of base Python types for BASETYPES inherited rules.

Assignments

In textX there are two types of many assignments (*= - zero or more, += - one or more) whereas in Xtext there is only one (+=) which defines the type of the inferred attribute but doesn't specify any information for the parser. Thus, if there should be zero or more matched elements you must additionally wrap your expression in zero or more match:

In Xtext:

Domainmodel :
    (elements+=Type)*;

In textX:

Domainmodel :
    elements*=Type;

Similarly, optional assignment in Xtext is written as:

static?='static'?

In textX a '?' at the end of the expression is implied, i.e. rhs of the assignment will be optional:

static?='static'

Regular expression match

In Xtext terminal rules are described using EBNF.

In textX there is no difference between parser and terminal rules so you can use the full textX language to define terminals. Furthermore, textX gives you the full power of Python regular expressions through regular expression match. Regex matches are defined inside / /. Anything you can use in Python re module you can use here. This gives you quite powerful sublanguage for pattern definition.

In Xtext:

terminal ASCII:
    '0x' ('0'..'7') ('0'..'9'|'A'..'F');

In textX:

ASCII:
    /0x[0-7]([0-9]|[A-F])/;

Literal Regex match can be used anywhere a regular match rule can be used.

For example:

Person:
    name=/[a-zA-Z]+/ age=INT;

Repetition modifiers

textX provides a syntactic construct called repetition modifier which enables parser to be altered during parsing of a specific repetition expression.

For example, there is often a need to define a separated list of elements.

To match a list of integers separated by comma in Xtext you would write:

list_of_ints+=INT (',' list_of_ints+=INT)*

In textX the same expression can be written as:

list_of_inst+=INT[',']

The parser is instructed to parse one or more INT with commas in between. Repetition modifier can be a regular expression match too.

For example, to match one or more integer separated by comma or semi-colon:

list_of_ints+=INT[/,|;/]

Inside square brackets more than one repetition modifier can be defined. See section in the docs for additional explanations.

We are not aware of the similar feature in Xtext.

Rule modifiers

Similarly to repetition modifiers, in textX parser can be altered at the rule level too. Currently, only white-space alteration can be defined on the rule level:

For example:

    Rule:
        'entity' name=ID /\s*/ call=Rule2;
    Rule2[noskipws]:
        'first' 'second';

Parser will be altered for Rule2 not to skip white-spaces. All rules down the call chain inherit modifiers.

There are hidden rules in Xtext which can achieve the similar effect, even define different kind of tokens that can be hidden from the semantic model, but the rule modifier in textX serve different purpose. It is a general mechanism for parser alteration per rule that can be used in the future to define some other alteration (e.g. case sensitivity).

Unordered groups

Xtext support unordered groups using & operator.

For example:

Modifier: 
    static?='static'? & final?='final'? & visibility=Visibility;

enum Visibility:
    PUBLIC='public' | PRIVATE='private' | PROTECTED='protected';

In textX unordered groups are specified as a special kind of repetitions. Thus, repetition modifiers can be applied also:

Modifier: 
    (static?='static' final?='final' visibility=Visibility)#[',']

Visibility:
    'public' | 'private' | 'protected';

Previous example will match any of the following:

private, static, final
static, private, final
...

Notice the use of , separator as a repetition modifier.

Syntactic predicates

textX is based on PEG grammars. Unlike CFGs, PEGs can't be ambiguous, i.e. if an input parses it has exactly one parse tree. textX is backtracking parser and will try each alternative in predetermined order until it succeeds. Thus, textX grammar can't be ambiguous. Nevertheless, sometimes it is not possible to specify desired parse tree by reordering alternatives. In that case syntactic predicates are used. textX implements both and- and not- syntactic predicates.

On the other hand, predictive non-backtracking parsers (as is ANTLR used by Xtext) must make a decision which alternative to chose. Thus, grammar might be ambiguous and additional specification is needed by a language designer to resolve ambiguity and choose desired parse tree. Xtext uses a positive lookahead syntactic predicates (=> and ->). See here.

Hidden rules

Xtext uses hidden terminal symbols to suppress non-important parts of the input from the semantic model. This is used for comments, whitespaces etc. Terminal rules are referenced from the hidden list in the parser rules. All rules called from the one using hidden terminals inherits them.

textX provides support for whitespaces alteration on the parser level and rule level and a special Comment match rule that can be used to describe comments pattern which are suppressed from the model. Comment rule is currently defined for the whole grammar, i.e. can't be altered on a per-rule basis.

Parent-child relationships

textX will provide explicit parent reference on all objects that are contained inside some other objects. This attribute is a plain Python attribute. The relationship is imposed by the grammar.

Xtext, begin based on Ecore, provides similar mechanism through Ecore API.

Enums

Xtext support Enum rules while textX does not. In textX you use match rule with ordered choice to mimic enums.

Scoping

At present stage textX doesn't provide builtin mechanism for scoping definition. However, this can be done in Python using object processors but there is no specific scoping API that could help language developer in resolving links.

Xtext does provide a Scoping API which can be used by the Xtend code to specify scoping rules.

Additional differences in the tool usage

Some of the differences in tools usage are outlined here.

REPL

textX is Python based, thus it is easy to interactively play with it on the Python console.

Example ipython session:

In [1]: from textx import metamodel_from_str

In [2]: mm = metamodel_from_str("""
...: Model: points+=Point;
...: Point: x=INT ',' y=INT ';';
...: """)

In [3]: model = mm.model_from_str("""
...: 34, 45; 56, 78; 88, 12;""")

In [4]: model.points
Out[4]: 
[<textx:Point object at 0x7fdfb4cda828>,
<textx:Point object at 0x7fdfb4cdada0>,
<textx:Point object at 0x7fdfb4cdacf8>]

In [5]: model.points[1].x
Out[5]: 56

In [6]: model.points[1].y
Out[6]: 78

Xtext is Java based and works as generator thus it is not possible, as far as we know, to experiment in this way.

Post-processing

textX provide model objects post processing by registering a Python callable that will receive object as it is constructed. Post-processing is used for all sorts of things, from model semantic validation to model augmentation.

An approach to augment model after loading in Xtext is given here.

Parser control

In textX several aspect of parsing can be controlled:

  • Whitespaces
  • Case sensitivity
  • Keyword handling

These settings are altered during meta-model construction. Whitespaces can be further controlled on a per-rule basis.

Xtext enable hidden terminal symbols which can be used for whitespace handling. Case sensitivity can be altered for parser rules but not for lexer rules.

Mapping to host language types

textX will dynamically create ordinary Python classes from the grammar rules. You can register your own classes during meta-model construction which will be used instead. Thus, it is easy to provide your domain model in the form of Python classes.

Xtext is based on ECore model, thus all concepts will be instances of ECore classes. Additionally, there is an API which can be used to dynamically build JVM types from the DSL concepts providing tight integration with JVM.

Built-in objects

In textX you can provide objects that will be available to every model. It is used to provide, e.g. built-in types of the language. For more details see built-in objects section in the docs.

An approach to augment model after loading in Xtext is given here.

Additional languages

Xtext use two additional DSLs:

  • Xbase - a general expression language
  • Xtend - a modern Java dialect which can be used in various places in the Xtext framework

The only additional DSL used in textX is genconf which is a DSL for generator configuration and has been developed as a part of textx-tools project.

Template engines

textX doesn't impose a particular template engine to be used for code generation. Although we use Jinja2 in some of the examples, there is nothing in textX that is Jinja2 specific. You can use any template engine you like.

Xtext provide it's own template language as a part of Xtend DSL. This language nicely integrates in the overall platform.

IDE integration

Xtext is integrated in Eclipse and InteliJ IDEs and generates full language-specific tool-chain from the grammar description and additional specifications.

textX does not provide IDE integrations. There is textx-tools project which provide pluggable platform for developing textX languages and generators with project scaffolding. Integration for popular code editors is planned. There is some basic support for vim and emacs at the moment. There is a support for visualization of grammars (meta-models) and models but the model visualization is generic, i.e. it will show you the object graph of your model objects. We plan to develop language-specific model visualization support.