Graphs & Numbers: 2014

September 30, 2014

Building the Collatz Graph in Henshin

The other day I saw an online video about a number-theoretical problem called the Collatz conjecture. The video included some visualizations of the problem which were based on force-directed graph drawing. It was immediately clear to me that such visualizations could be also easily done in Henshin. So I built a minimalistic EMF model and sketched two simple rules:

The first one multiplies x by 2, the second one is the inverse of the 3x+1 rule. Technically, it is not exactly the inverse of the Collatz function, but the conjecture is still true if and only if these two rules can generate every natural number (if we ignore the upper bound in the first rule).

Finally I created an instance model for the initial state and set up a state space. In the following video you can see how to use Henshin's state space explorer to generate the Collatz graph. Have fun.

August 31, 2014

Why Model Transformations != Graph Transformations

Model transformations are probably the most important application of (the formal theory of) graph transformations. This includes classical source-to-target model transformations, bidirectional transformations and synchronizations, as well as in-place rewriting, e.g., for refactoring. All these applications have in common that the underlying graphs to be transformed are actually EMF- or MOF-based structural data models.

These models have of course a graph structure, but when taking a closer look, they are actually more than just that. The higher expressive power of metamodeling languages like EMF or MOF in comparison to simple graphs has some severe implications when trying to apply formal (mathematical) methods known from the graph transformation research community.

Let's take a look at a concrete example of a Henshin transformation. The following rule is a variation of the Bank Accounts example:

This rule collects all accounts which do not belong to any client and adds them to a designated bank for "orphaned" accounts. The graph transformation semantics of this rule does exactly what is shown: it creates zero or more edges between a bank and a number of account objects. In contrast, the EMF-semantics actually has a twist which is not directly visible here: any of the orphaned accounts which before belonged to a different bank object are removed from their original bank. Thus, this rule can actually also delete edges.

This is because in EMF, adding an object to a container also means that it is removed from its old container. Implicit deletion of edges can also occur for edges with multiplicity 1. Another semantical difference to graph transformations exists for the handling of bidirectional references (eOpposite-references). In this case, EMF automatically maintains the opposite references in a consistent state which can have the effect that EMF removes or creates edges. It can actually get more complicated when you have a combination of containment and multiplicity constraints, and opposite edges.

These phenomena can be considered as side effects of rules -- they are actions performed by EMF even if they are not explicitly specified in the transformation rules. Henshin does not prevent these side effects (that is actually not possible), but it will print warning messages when such side effects occur during the execution.

From the EMF / MOF perspective, the described effects are intended and desirable. What is problematic though is to assume that model transformations per se have graph transformation semantics and (what is more dangerous) to apply formal verification methods from the graph transformation domain without taking into account these possible side effects of rules.

There are also other related, partly surprising, phenomena when dealing with EMF models. For example, copying an object does not necessarily yield an identical object, i.e., the expression:

EcoreUtil.equals(eObject, EcoreUtil.copy(eObject))

does not always return true. This is because the copy mechanism in EMF won't make any changes which would alter the original model (e.g. bend an opposite reference).

The lesson learned is that models are more than graphs, and model transformations are more complicated than graph transformations.

July 28, 2014

Henshin 1.0 (Update)

8 years after the first PoC and 4 years after its official launch, we are proud to announce the release of Henshin 1.0 -- a model transformation language and framework for EMF. Henshin 1.0 is ready for Eclipse Luna and Java 8 (including its new JavaScript engine Nashorn). The primary domain of Henshin is model-driven development and formal verification methods. With its new multi-threaded pattern matching engine and its code generator for Apache Giraph, big data is a new third main domain of Henshin.

The new version of Henshin comes with an extended transformation language with support for specifying Java imports and annotations, and a couple of new utility APIs. Besides the changes under the hood, the 1.0 release also comes with a number of visible new features, including a reworked interpreter wizard and new preferences for the diagram editor. As an eye candy, we also added a new color mode for the diagram editor which goes well together with Luna's Dark theme.

You can obtain Henshin 1.0 from the release update site or clone our Git repo to try it out!

Update from 30-07-2014

And of course we've build a critical bug into the release! Anyway, the fixed version is available at the release update site.

June 8, 2014

ISA: An Integer Sequence Analyzer

ISA is a proof of concept of an integer sequence analyzer. It consists of a very simple web interface for entering the beginning of an integer sequence. When clicking on Analyze, ISA tries to compute a formula for an infinite sequence that starts with the entered numbers.

Under the hood, ISA uses a term generator to compute (recursive) formulas. These formulas are evaluated and the result is compared with the sequence entered by the user. If no matching formula was found after 30 seconds ISA gives up and times out. To get a feeling what is currently possible, try out the following sequences:

0 0 1 1 4 4 9 9
2 3 6 18 108
1 2 2 3 3 3

The interface and the idea of ISA resembles OEIS which was created by Neil Sloane (former researcher at AT&T Labs). The main difference is that OEIS is a database, whereas ISA is a formula generator.

Since I am using a Raspberry Pi to host and run ISA, some heuristics might be useful to find common sequences more directly. But of course, in general only a tiny class of sequences can be automatically found in this way.

March 16, 2014

Transformation Tool Contest 2014 -- Call for Solutions

The 2014 Transformation Tool Contest (TTC) seeks your solutions to two challenging transformation problems, involving transforming financial XML models into object-oriented code, and transforming the very large model that underpins the IMDb movie database.

Select your favorite transformation tool and join the contest! The deadline for solution descriptions is 23 April 2014. More information, including links to the problem descriptions, at: http://www.transformation-tool-contest.eu/cfs.html

March 8, 2014

Just-In-Time Java Compilation in Eclipse

Just-in-time (JIT) compilation is usually applied to get a better execution performance compared to interpreter-based approaches. There are also other scenarios where JIT compilation is useful, e.g., when programming-level code needs to be generated at run-time. Either way, this technique is usually considered to be the domain of experts since it requires a combination of code generation, compilation and dynamic loading and linking. In this post, I show you a simple pipeline based on EMF, JET, JDK and Java core that allows you to do this with minimal effort.

Our starting point is the EMF model (class diagram) shown below. It is a model for simple arithmetic expressions. It consists of an abstract class Term and three concrete implementations of it: Operation, Constant and Parameter.

Below you see an example instance model in the reflective EMF editor. It is basically just a model representation of the expression (3.7 + (p0 * (p1 / 2.0))) where p0 and p1 are parameters. We use here the actual arithmetic operators as enum literals.

The easiest way to evaluate an expression like this one is to write an interpreter which can be as simple as a tree visitor in this case. But today, we want to use a JIT compilation approach to evaluate these expressions. Specifically, given an expression like the one above, we want to generate a Java object that implements this interface:

It contains only one method which evaluates the expression for specific parameter values. The first thing we do is write a pretty-printer that generates a string representation of a term in Java syntax. We extend the EMF-generated Switch class for this:

We get "(3.7 + (params[0] * (params[1] / 2.0)))" if we feed the above Term model into this pretty-printer. Now that we are able to generate the Java expression from Term objects, we just need to generate the rest of the class that will implement the Calculator interface. We use a simple JET template for this, as shown below. There are much more sophisticated template languages, but JET will do just fine in our case.

What is left to do is the actual JIT compilation consisting of three steps:

Generate the Java code from a given Term model using JET,
Compile the generated Java code using the BatchCompiler of JDK,
Dynamically load the compiled calculator class and instantiate it.

The code below does all the three steps. So, again: it takes a Term model and generates an executable calculator instance. Voila, our JIT-compilation.

If you would like to try it out yourself, you can check out the complete source code from GitHub.

February 5, 2014

Parallel Graph Pattern Matching in Henshin (Update)

In my last post, I reported on an improvement of the graph pattern matching algorithm in Henshin that showed a major performance boost for rules where the nodes in the left-hand sides are connected only by <<require>> nodes and edges. As a concrete example, we used the transformation rule below to find couples of actors / actresses in the IMDB movie database which starred together in at least three movies.

Thanks to the tweak in the pattern matching engine we were able to apply this rule to much larger input graphs. Unfortunately, the synthetic test data that we used for our last benchmark does not have the same complexity as the real data from IMDB. In our test data, every person played in not more than 10 movies and every movie had a similarly low number of actors / actresses. In reality, the IMDB data contains movies with more than 1000 actors and obviously also persons that played in a large number of movies.

So I executed our find-couples example on the real IMDB data (thanks for the data import, Tassilo!). Not surprisingly, the results were much worse than for the synthetic data. The reason for this is that the real IMDB data is much more interconnected. So I started tweaking the pattern matcher of Henshin again. Eventually, I realized that I need something more to crack this nut.

SAP Coding Masters 2013

A possible way to improve the performance is to parallelize the graph pattern matching algorithm. In the current development version of Henshin, this is actually implemented! Here is how you can enable it:

int partitions = 10;
EGraph graph = new PartitionedEGraphImpl(model,partitions);
Engine engine = new EngineImpl();
engine.getOptines.put(Engine.OPTION_WORKER_THREADS,
   partitions);

// Execute transformation

engine.shutdown();

You need to use the new implementation of EGraph called PartitionedEGraphImpl. It basically splits up the graph to be transformed into N disjoint partitions. The second change you need to do is to set the number of worker threads in the transformation engine. In theory, this number can be different to the number of partitions, but the easiest setup is to just use the same amount of threads as there are partitions. The third point you need to ensure is that your rule uses *-stereotypes for nodes and edges (essentially it transforms all matches at once). Finally, you should shutdown the engine when your done to get rid of the worker threads.

For the example of finding couples in the IMDB database, this parallel graph pattern matching approach works actually quite well. Below you can see the performance gains achieved on an ordinary desktop computer.

With this new parallel graph pattern matcher in the interpreter, and the recently introduced Giraph code generator, Henshin really is becoming a big-data tool, which I find very exciting.

Update (07 February 2014)

I realized that the performance in absolute number is still quite bad. As I wrote in my earlier post, the optimized match finder uses path constraints to check the existence of the movies which connect the actors. The problem was though that the match finder still made an additional check whether the <<require>> elements exist, i.e., it performed a complete matching of the PAC (positive application condition). This is actually not necessary here because the path constraint is sufficient already.

To fix this problem, I added a static analysis which finds out once in the beginning whether path constraints are enough to check the PAC. This significantly reduces the run-times as you can see below. Using 12 worker threads, the processing of 2 million nodes takes 173 seconds. But to be honest, I have my doubts whether these engine optimizations will be helpful also in other cases.

Nodes	Couples	Time (ms)
499,995	79,470	6,582
1,004,463	244,627	27,239
1,505,143	585,689	87,721
2,000,900	1,017,752	173,142