May 1, 2015

Towards Modeling of Distributed Graph Algorithms

The most prominent approach for distributed processing of very large graphs today is probably the Bulk Synchronous Parallel (BSP) model. BSP is a bridging model for implementing graph algorithms in such a way that they can be massively parallelized and distributed. One application can be found in the Henshin graph transformation tool, where graph transformation systems can be modeled, and code generated for the BSP framework Apache Giraph.

Although the BSP model is widely accepted and used today, there is no standard implementation framework for BSP algorithms. Frameworks that do provide BSP include Google's Pregel, Apache Giraph and GraphX in Apache Spark. For application developers it is not easy to find out which platform is the most suited one for her problem. An implementation of a graph algorithm in, say, Apache Giraph cannot be reused in other frameworks, even though the underlying concepts of BSP are the same. This is unfortunate, particularly because the overhead of developing, deploying and testing these algorithms is rather high due to the complexity of the distributed frameworks.

This problem can be solved by introducing a modeling layer for BSP algorithms. Instead of directly implementing graph algorithms in Pregel, Giraph or GraphX, the idea is to use a modeling language that supports the concepts of BSP. One possible approach is to use UML 2 state machines for the modeling. The figure below shows a state machine for a BSP-model of the shortest path algorithm.

Using such a model, one could generate code for different platforms, e.g. Pregel, Giraph or GraphX. The code generators need to be implemented only once and are straight-forward to build since the BSP concepts are more or less directly used. Using different code generators, algorithm engineers can automatically derive implementations for different platforms and benchmark them against each other without implementing a single line of code.

In Henshin, the code generation for Apache Giraph is quite complex because the concepts of graph transformations are rather different than BSP. One opportunity to simplify it would be to use BSP models as intermediate target language. Specifically, model transformations can be employed to translate a Henshin model into a BSP model. From that point on, platform-specific code generators could be used to generate the final implementation.

Using a modeling approach would enable the reuse of the many many graph algorithms already implemented in the existing BSP-frameworks. Maybe UML state machines are not the best approach for the modeling -- for instance, one could also come up with a (textual) DSL for BSP. The important point is that an abstraction from the actual implementation platforms is made.  

No comments:

Post a Comment