Meandre 2.0 Alpha Preview = Scala + MongoDB

A lot of water under the bridge has gone by since the first release of Meandre 1.4.X series. In January I went back to the drawing board and start sketching what was going to be 1.5.X series. The slide deck



A lot of water under the bridge has gone by since the first release of Meandre 1.4.X series. In January I went back to the drawing board and start sketching what was going to be 1.5.X series. The slide deck embedded above is a extended list of the thoughts during the process. As usual, I started collecting feedback from people using 1.4.X in production, things that worked, things that needed improvement, things that were just plain over complicated. The hot recurrent topics that people using 1.4.X could be mainly summarized as:

  • Complex execution concurrency model based on traditional semaphores written in Java (mostly my maintenance nightmare when changes need to be introduced)
  • Server performance bounded by JENA‘s persistent model implementation
  • State caching on individual servers to boost performance increases complexity of single-image cluster deployments
  • Could-deployable infrastructure, but not cloud-friendly infrastructure

As I mentioned, these elements where the main ingredients to target for 1.5.X series. However as the redesign moved forward, the new version represented a radical disruption from 1.4.X series and eventually turned up to become the 2.0 Alpha version described here. The main changes that forced this transition are:

  • Cloud-friendly infrastructure required rethinking of the core functionalities
  • Drastic redesign of the back-end state storage
  • Revisited flow execution engine to support flow execution
  • Changes on the API that render returned JSON documents incompatible with 1.4.X

Meandre 2.0 (currently already available in the the SVN trunk) has been rewritten from scratch using Scala. That decision was motivated to benefit from the Actor model provided by Scala (modeled after Erlang‘s actors). Such model greatly simplify the mechanics of the infrastructure, but it also powered the basis of Snowfield (the effort to create a scalable distributed flow execution engine for Meandre flows). Also, the Scala language expressiveness has greatly reduced the code based size (2.0 code base is roughly 1/3 of the size of 1.4.X series) greatly simplifying the maintenance activities the infrastructure will require as we move forward.

The second big change that pushed the 2.0 Alpha trigger was the redesign of the back end state storage. 1.4.X series heavily relied on the relational storage for persistent RDF models provided by JENA. For performance reasons, JENA caches the model in memory and mostly assumes ownership of the model. Hence, if you want to provide a single-image Meandre cluster you need to inject into JENA cache coherence mechanics, greatly increasing the complexity. Also, the relational implementation relies on the mapping model into a table and triple into a row (this is a bit of a simplification). That implies that large number of SQL statements need to be generated to update models, heavily taxing the relational storage when changes on user repository data needs to be introduced.

An ideal cloud-friendly Meandre infrastructure should not maintain state (neither voluntarily, neither as result of JENA back end). Thus, a fast and scalable back end storage could allow infrastructure servers to maintain no state and be able to provide the appearance of a single image cluster. After testing different alternatives, their community support, and development roadmap, the only option left was MongoDB. Its setup simplicity for small installations and its ability to easily scale to large installations (including cloud-deployed ones) made MongoDB the candidate to maintain state for Meandre 2.0. This was quite a departure from 1.4.x series, where you had the choice to store state via JENA on an embedded Derby or an external MySQL server.

A final note on the building blocks that made possible 2.0 series. Two other side projects where started to support the development of what will become Meandre 2.0.X series:

  1. Crochet: Crochet targets to help quickly prototype REST APIs relying on the flexibility of the Scala language. The initial ideas for Crochet were inspired after reading Gabriele Renzi post on creating a picoframework with Scala (see http://www.riffraff.info/2009/4/11/step-a-scala-web-picoframework) and the need for quickly prototyping APIs for pilot projects. Crochet also provides mechanisms to hide repetitive tasks involved with default responses and authentication/authorization piggybacking on the mechanics provided by application servers.
  2. SnareSnare is a coordination layer for distributed applications written in Scala and relies and MongoDB to implement its communication layer. Snare implements a basic heartbeat system and a simple notification mechanism (peer-to-peer and broadcast communication). Snare relies on MongoDB to track heartbeat and notification mailboxes.

Fast REST API prototyping with Crochet and Scala

I just finished committing the last changes to Crochet and tagged version 0.1.4vcli now publicly available on GitHub (http://github.com/xllora/Crochet). Also feel free to visit the issues page in case you run into question/problems/bugs. Motivation Crochet is a light weight web framework oriented to rapid prototyping of REST APIs. If you are looking for a Rails […]

Related posts:

  1. Meandre 2.0 Alpha Preview = Scala + MongoDB
  2. Meandre is going Scala
  3. Fast mutation implementation for genetic algorithms in Python

I just finished committing the last changes to Crochet and tagged version 0.1.4vcli now publicly available on GitHub (http://github.com/xllora/Crochet). Also feel free to visit the issues page in case you run into question/problems/bugs.

Motivation

Crochet is a light weight web framework oriented to rapid prototyping of REST APIs. If you are looking for a Rails like framework written in Scala, please take a look at Lift at http://liftweb.net/ instead.

Crochet targets quick prototyping of REST APIs relying on the flexibility of the Scala language. The initial ideas for Crochet were inspired while reading Gabriele Renzi post on creating the STEP picoframework with Scala and the need for quickly prototyping APIs for pilot projects. Crochet also provides mechanisms to hide repetitive tasks involved with default responses and authentication/authorization piggybacking on the mechanics provided by application servers.

Who uses Crochet?

Crochet was born from the need for quickly prototyping REST APIs which required exposing legacy code written in Java. I have been actively using Crochet to provide REST APIs for a variety of projects developed at the National Center for Supercomputing Applications. One of the primary adopters and movers of Crochet is the Meandre Infrastructure for data-intensive computing developed under the SEASR project.

Crochet in 2 minuts

Before you start please check you have Scala installed on your system. You can find more information on how to get Scala up and running here.

  1. Get the latest Crochet jar from the Downloads section at GitHub and the third party dependencies.
  2. Copy the following code into a file named hello-world.scala.
    import crochet._
    new Crochet {
         get("/message") { 
             <html>
                   <head><title>Hello World</title></head>
                   <body><h1>Hello World!</h1></body>
             </html>
         }
    } on 8080
  3. Get your server up and running by running (please change the version number if needed)
    $ scala -cp crochet-0.1.4.jar:crochet-3dparty-libraries-0.1.X.jar hello-world.scala
  4. You just have your first _Crochet_ API up and running. You can check the API working by opening your browser and pointing it to http://localhost:8080/message and you should get the message Hello World! back.

    Where to go from here?

    You will find more information on the Crochet wiki at GitHub. The wiki contains basic information as a QuickStart guide (which also includes how to deal with static content), descriptions of the basic concepts used in Crochet, and several examples that can get up and running fast.

    Related posts:

    1. Meandre 2.0 Alpha Preview = Scala + MongoDB
    2. Meandre is going Scala
    3. Fast mutation implementation for genetic algorithms in Python