October 31, 2013

Henshin + Giraph: Enforced Parallelism

I recently announced the availability of a Giraph code generator for Henshin, which allows you to execute graph transformations in a parallel and distributed manner. It is important to know that there is a fundamental difference in the execution semantics of graph transformations in this approach. Concretely, the application of rules is always maximum parallel, i.e., every rule is always applied to all found matches in parallel. The idea behind this is that we want to enforce the use of parallelism already at the modeling level and to make it hard for developers to use conventional sequential algorithms that cannot be parallelized automatically. For example, the rule AddTriangles below would be applied to all found triangles in parallel. Executing the iterated unit BuildSierpinski to a 3-node triangle therefore generates the full Sierpinski triangle of depth N.
Due to the change in the execution semantics of rules, other existing and also some new modeling concepts specifically tailored for parallel graph transformations become important. For example, vertices and edges with <<require>> and <<forbid>> stereotypes are particularly useful to avoid overlapping matches which could result in inconsistent parallel rule applications. We also introduced concepts that allow you to aggregate over attribute values of all found matches. I will discuss this in more detail in another post.

October 15, 2013

Henshin + Giraph = Big Data Graph Transformations

Henshin is a graph and model transformation tool for Eclipse that lets you define rules and workflows for transformation in a graphical editor. By default, these transformations are executed using an interpreter on an in-memory EMF model.

Apache Giraph is a distributed and parallel graph processing framework built for high scalability. Giraph is for instance used at Facebook to analyze the underlying graphs of social networks.

So how do Henshin and Giraph fit together? Easy: Henshin provides an expressive graph transformation language with an intuitive graphical syntax. Giraph provides an infrastructure for highly parallel and distributed graph processing. The sort of obvious way of combining the two is to use Henshin as modeling tool and Giraph as execution engine. And that is what we are currently working on.



We implemented a code generator that produces Java code for Giraph from Henshin models. This generated code contains pattern matching code and graph manipulations for rules, and the required coordination to execute transformation units (workflows). The code generation is still an experimental feature but we plan to stabilize it and ship it with the next release of Henshin. We have already a small test suite and conducted some promising benchmarks. More details later.

The Giraph code generator is available in the development version of Henshin (for Eclipse Kepler). You can get it from our nightly build update site or by getting the source code directly.

October 2, 2013

Parallel State Space Exploration in Henshin

Back from the summer break, I read an interesting article on combining OCL expressions with CTL formulas. Apart from the modeling approach itself, it was interesting to me that the authors used Henshin to generate their state spaces and also compared their modeling and the performance of Henshin's state space generator with GROOVE.

While the GROOVE tool does not really provide an API for its simulator and state space generator, it is fairly easy to do this programmatically in Henshin:
// Set-up:
StateSpace stateSpace = new StateSpaceImpl();
stateSpace.getRules().addAll(rules);
StateSpaceManager manager = 
   new ParallelStateSpaceManager(stateSpace, 4);
manager.createInitialState(new ModelImpl(resource));

// Exploration:
StateSpaceExplorationHelper helper = 
   new StateSpaceExplorationHelper(manager);
helper.doExploration(-1, monitor);
That's it. After you have generated the state space, you can either save it or use a validator to check invariants or to do model checking etc.

The authors of the OCL-meets-CTL paper mention that there is room for improving the performance of Henshin's state space generator, which I fully agree to. Nevertheless, I want to show here that we have put a lot of effort to make Henshin's state space generator as fast as possible. A key difference to GROOVE is that state space generation in Henshin can be parallelized. Let me show you some numbers.


The chart shows the time needed to generate the state space of the dining philosophers example for 10 philosophers. What you can see is that by using multi-threaded state space generation, we obtain a speed-up of factor 3. In absolute numbers, the generator explores around 2000 states per second. That is just half a millisecond to find all rule matches for a state graph, make a copy and transform it for every match, and look up states and update the state space. If you compare it with the performance of the Henshin interpreter itself (see e.g. here), these numbers are top notch. Note also that the speed can be greatly improved simply by increasing the cache sizes of the state space generator (which requires more main memory though).

That being said, there is still of course room for improvement. For example, the scale-up for a high number of processors and threads is not as good as we have hoped. The required locking of the state space seems to be the main bottleneck. The performance can be probably tweaked but it will of course never be as fast as model checkers that work on vector-based data (as opposed to graphs).

Henshin 0.9.10 on its way. We are working on the new version of Henshin which we hope to release in November 2013. It will target and require Eclipse Kepler. The most exciting new feature will be a new execution platform for Henshin transformations. More on this later.