Domain-Specific Languages

Software systems are often implementing concepts from the business domain repeatedly. This makes code hard to maintain and frequently leads to inconsistent behavior. To avoid these problems, one can in many cases implement a concept as a function or a class and use this central implementation wherever it is needed.

However, this is not always possible. The concept might not be easily expressible in the chosen programming language. Or it might not be desirable to use the syntax of a programming language if it is difficult to deal with for non-programmers or too clumsy for expressing domain concepts.

In these situations it may make sense to introduce a domain-specific language (DSL). Such a language allows business persons to express complex issues in familiar terms. A set of represented issues is called a "model".

DSLs can be graphical or textual languages. While models represented in graphical languages can sometimes be understood in less time, textual languages also have their advantages:

You can use any state-of-the art text editor with all of the usual editing features such as searching and replacing, copying and pasting, undo/redo.
It is easy to compare different versions of models using existing tools. This also makes textual models easy to manage in main-stream version-control systems.
Incomplete and even structurally incorrect models can be written and stored.

In any case a model represented in a DSL must be parsed and represented as a data structure. In some projects it makes sense to compile models into code of a programming language such as Java or C. In other projects software implementing the business logic in a generic way will work with this data structure. The latter approach can be seen as an "interpreting" approach.

Frequently a DSL is designed as an external representation of internal data structures of an existing software system. In these cases there is no need to implement an interpreter for the DSL because it is already available.

Tools

Fortunately over the last years tools for developing DSLs (in Eclipse projects and elsewhere) have reached a high degree of maturity. This reduces the cost of developing a DSL enormously in comparison to a few years ago. Many DSL-related tools are even available as open source. In the sequel we give an overview of the tools and technologies we are using or have been using at webXcerpt.

We are using Xtext for defining textual DSLs. For a language defined by its grammar Xtext can automatically generate a parser and an Eclipse-based editor. The default behavior of the generated parser and editor can be extended by injecting specialized code. Such code might, for example,

convert the textual representation of a primitive value into a data item,
find the definition of an identifier, or
check the model for validity, providing error messages and warnings when needed.

Xtext uses ANTLR behind the scenes for generating parsers. In certain situations it may make sense to use ANTLR directly for parsing a DSL.

Graphical DSLs can often be represented as UML extensions using application-specific "stereotypes". Generic UML editors such as MagicDraw or Enterprise Architect can be used to edit these UML-based languages.

In a way, even an Excel sheet prepared for the input of domain data can be seen as a DSL. Data can be extracted by macros (written in, e.g., Visual Basic or C#) or using a library like Apache POI which supports parsing the Excel file format.

EMF, the "Eclipse Modeling Framework", is a powerful Java library and code-generation tool for complex data structures. Both Xtext and the Eclipse UML tools use EMF for representing models internally. EMF also supports conversions to and from XML.

Xcore is a language for declaring EMF data structures. This is useful if we have to transform a model provided in some DSL into a different model structure before we can really work on it.

The programming language Xtend is a variant of Java with a concise syntax. It comes with a built-in template syntax, which also makes it a practical tool for generating textual output from models.

Internal DSLs

DSLs as described above make sense in some cases to avoid the "technicalities" of general-purpose programming languages. In other cases models can and should be represented using (a subset of) a general-purpose language, typically making use of the API of a "model-builder" library. In the former case we speak of an "external DSL" while in the latter case we speak of an "internal DSL" embedded in a "host language". The following pros and cons for internal and external DSLs also provide some criteria for deciding between them.

Usually external DSLs are more effort to implement. (In particular, a concrete syntax must be defined and parsed.) This difference grows (in absolute terms) with the complexity of the language. On the other hand, since the editor for an external DSL has a specialized understanding of the DSL, it can support the user better with regard to error messages or code-completion suggestions.

It depends on the host language whether domain concepts can be expressed in a straight-forward way in an internal DSL or if the host language imposes a significant syntactic or semantic overhead. If the DSL is used by programmers, this overhead is usually less of a problem, but it may be a problem for domain experts that are not programmers, especially if they are not only supposed to read DSL code but also to write it. Even if an internal DSL does not use the full syntax of its host language, the compiler of the host language might emit hard-to-understand error messages. On the other hand, for programmers the abstraction mechanisms provided by the host language may be quite helpful. Furthermore the syntax and semantics of a popular host language may already be known.

Finally, if models become available only at runtime, they cannot be expressed by an internal DSL in a host language that is compiled ahead of time.

At webXcerpt we have implemented internal DSLs embedded in host languages such as Haskell and JavaScript.

Projects

At webXcerpt we have worked on several projects involving domain-specific languages, including the following ones:

Product Configuration I

We designed and developed

high-level languages based on customer-specific concepts (see ConfigModeler),
a low-level language representing the datamodel of the SAP Variant Configurator (see VClipse),
an IDE for these languages,
and transformations from the high-level language to the low-level language and from the latter to SAP's internal data structures.

Eclipse, Xtext, EMF, Xtend
SAP Variant Configurator (RFC, IDoc)
telecommunications equipment manufacturer

Product Configuration II

We designed and developed

a high-level language based on customer-specific concepts (see ConfigModeler),
a low-level language representing the datamodel of PROS Cameleon CPQ (see COL),
an IDE for these languages,
and transformations from the high-level language to the low-level language and from the latter to the XML-based import/export format of Cameleon CPQ.

Eclipse, Xtext, EMF, Xtend
PROS Cameleon CPQ (XML)
telecommunications equipment manufacturer

Object Model and Web Services

We implemented a translator producing Java code and XML data from a UML model heavily using customer-specific stereotypes.

The translator was a drop-in replacement for a predecessor system. A heavily test-driven development approach was used to ensure compatibility.

UML, MagicDraw, EMF, Xtend, Java
Java, XML
car manufacturer

Web Site: Structure and Behavior

We participated in the design and implementation of a family of interrelated DSLs used to describe all aspects of a large web application, from data model, user interface, business logic, to access control.

Most of the system structure and behavior is automatically generated from these DSLs.

Eclipse, Xtext, EMF
Java, XML, database schema, system configuration
public administration

Visualization

We implemented an IDE for an existing language used for describing visualization rules for product variants. The IDE integrates with VClipse.

Eclipse, Xtext, EMF, Xtend
software tool vendor

Product Configuration III

We participated in a development project for a language and IDE for describing product-configuration models. These tools have become products of a large software vendor.

Eclipse, Xtext, EMF, Xtend
configuration-system API
software company

Migrating Text Templates

We participated in the proof-of-concept phase of a text-processor migration project. We implemented

a parser for the source system's template definition language with a line-oriented syntax and a non-standard lexical structure,
a DSL for representing the internal data structures of the target system,
a partial translator from the former to the latter language,
and an import operation for the target system.

Eclipse, Xtext, EMF, Xtend
target-system API
insurance company