Meandre 2.0 Alpha Preview = Scala + MongoDB

A lot of water under the bridge has gone by since the first release of Meandre 1.4.X series. In January I went back to the drawing board and start sketching what was going to be 1.5.X series. The slide deck



A lot of water under the bridge has gone by since the first release of Meandre 1.4.X series. In January I went back to the drawing board and start sketching what was going to be 1.5.X series. The slide deck embedded above is a extended list of the thoughts during the process. As usual, I started collecting feedback from people using 1.4.X in production, things that worked, things that needed improvement, things that were just plain over complicated. The hot recurrent topics that people using 1.4.X could be mainly summarized as:

  • Complex execution concurrency model based on traditional semaphores written in Java (mostly my maintenance nightmare when changes need to be introduced)
  • Server performance bounded by JENA‘s persistent model implementation
  • State caching on individual servers to boost performance increases complexity of single-image cluster deployments
  • Could-deployable infrastructure, but not cloud-friendly infrastructure

As I mentioned, these elements where the main ingredients to target for 1.5.X series. However as the redesign moved forward, the new version represented a radical disruption from 1.4.X series and eventually turned up to become the 2.0 Alpha version described here. The main changes that forced this transition are:

  • Cloud-friendly infrastructure required rethinking of the core functionalities
  • Drastic redesign of the back-end state storage
  • Revisited flow execution engine to support flow execution
  • Changes on the API that render returned JSON documents incompatible with 1.4.X

Meandre 2.0 (currently already available in the the SVN trunk) has been rewritten from scratch using Scala. That decision was motivated to benefit from the Actor model provided by Scala (modeled after Erlang‘s actors). Such model greatly simplify the mechanics of the infrastructure, but it also powered the basis of Snowfield (the effort to create a scalable distributed flow execution engine for Meandre flows). Also, the Scala language expressiveness has greatly reduced the code based size (2.0 code base is roughly 1/3 of the size of 1.4.X series) greatly simplifying the maintenance activities the infrastructure will require as we move forward.

The second big change that pushed the 2.0 Alpha trigger was the redesign of the back end state storage. 1.4.X series heavily relied on the relational storage for persistent RDF models provided by JENA. For performance reasons, JENA caches the model in memory and mostly assumes ownership of the model. Hence, if you want to provide a single-image Meandre cluster you need to inject into JENA cache coherence mechanics, greatly increasing the complexity. Also, the relational implementation relies on the mapping model into a table and triple into a row (this is a bit of a simplification). That implies that large number of SQL statements need to be generated to update models, heavily taxing the relational storage when changes on user repository data needs to be introduced.

An ideal cloud-friendly Meandre infrastructure should not maintain state (neither voluntarily, neither as result of JENA back end). Thus, a fast and scalable back end storage could allow infrastructure servers to maintain no state and be able to provide the appearance of a single image cluster. After testing different alternatives, their community support, and development roadmap, the only option left was MongoDB. Its setup simplicity for small installations and its ability to easily scale to large installations (including cloud-deployed ones) made MongoDB the candidate to maintain state for Meandre 2.0. This was quite a departure from 1.4.x series, where you had the choice to store state via JENA on an embedded Derby or an external MySQL server.

A final note on the building blocks that made possible 2.0 series. Two other side projects where started to support the development of what will become Meandre 2.0.X series:

  1. Crochet: Crochet targets to help quickly prototype REST APIs relying on the flexibility of the Scala language. The initial ideas for Crochet were inspired after reading Gabriele Renzi post on creating a picoframework with Scala (see http://www.riffraff.info/2009/4/11/step-a-scala-web-picoframework) and the need for quickly prototyping APIs for pilot projects. Crochet also provides mechanisms to hide repetitive tasks involved with default responses and authentication/authorization piggybacking on the mechanics provided by application servers.
  2. SnareSnare is a coordination layer for distributed applications written in Scala and relies and MongoDB to implement its communication layer. Snare implements a basic heartbeat system and a simple notification mechanism (peer-to-peer and broadcast communication). Snare relies on MongoDB to track heartbeat and notification mailboxes.

Fast REST API prototyping with Crochet and Scala

I just finished committing the last changes to Crochet and tagged version 0.1.4vcli now publicly available on GitHub (http://github.com/xllora/Crochet). Also feel free to visit the issues page in case you run into question/problems/bugs. Motivation Crochet is a light weight web framework oriented to rapid prototyping of REST APIs. If you are looking for a Rails […]

Related posts:

  1. Meandre 2.0 Alpha Preview = Scala + MongoDB
  2. Meandre is going Scala
  3. Fast mutation implementation for genetic algorithms in Python

I just finished committing the last changes to Crochet and tagged version 0.1.4vcli now publicly available on GitHub (http://github.com/xllora/Crochet). Also feel free to visit the issues page in case you run into question/problems/bugs.

Motivation

Crochet is a light weight web framework oriented to rapid prototyping of REST APIs. If you are looking for a Rails like framework written in Scala, please take a look at Lift at http://liftweb.net/ instead.

Crochet targets quick prototyping of REST APIs relying on the flexibility of the Scala language. The initial ideas for Crochet were inspired while reading Gabriele Renzi post on creating the STEP picoframework with Scala and the need for quickly prototyping APIs for pilot projects. Crochet also provides mechanisms to hide repetitive tasks involved with default responses and authentication/authorization piggybacking on the mechanics provided by application servers.

Who uses Crochet?

Crochet was born from the need for quickly prototyping REST APIs which required exposing legacy code written in Java. I have been actively using Crochet to provide REST APIs for a variety of projects developed at the National Center for Supercomputing Applications. One of the primary adopters and movers of Crochet is the Meandre Infrastructure for data-intensive computing developed under the SEASR project.

Crochet in 2 minuts

Before you start please check you have Scala installed on your system. You can find more information on how to get Scala up and running here.

  1. Get the latest Crochet jar from the Downloads section at GitHub and the third party dependencies.
  2. Copy the following code into a file named hello-world.scala.
    import crochet._
    new Crochet {
         get("/message") { 
             <html>
                   <head><title>Hello World</title></head>
                   <body><h1>Hello World!</h1></body>
             </html>
         }
    } on 8080
  3. Get your server up and running by running (please change the version number if needed)
    $ scala -cp crochet-0.1.4.jar:crochet-3dparty-libraries-0.1.X.jar hello-world.scala
  4. You just have your first _Crochet_ API up and running. You can check the API working by opening your browser and pointing it to http://localhost:8080/message and you should get the message Hello World! back.

    Where to go from here?

    You will find more information on the Crochet wiki at GitHub. The wiki contains basic information as a QuickStart guide (which also includes how to deal with static content), descriptions of the basic concepts used in Crochet, and several examples that can get up and running fast.

    Related posts:

    1. Meandre 2.0 Alpha Preview = Scala + MongoDB
    2. Meandre is going Scala
    3. Fast mutation implementation for genetic algorithms in Python

Meandre is going Scala

After quite a bit of experimenting with different alternatives, Meandre is moving into Scala. Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. This is not a radical process, but a gradual one while I am starting to revisit the infrastructure for the next […]

Related posts:

  1. Fast REST API prototyping with Crochet and Scala
  2. Meandre: Semantic-Driven Data-Intensive Flow Engine
  3. Meandre Infrastructure 1.4 RC1 tagged

After quite a bit of experimenting with different alternatives, Meandre is moving into Scala. Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. This is not a radical process, but a gradual one while I am starting to revisit the infrastructure for the next major release. Scala also generates code for the JVM making mix and match trivial. I started fuzzing around with Scala back when I started the development of Meandre during the summer of 2007, however I did fall back to Java since that was what most of the people in the group was comfortable with. I was fascinated with Scala fusion of object oriented programming and functional programming. Time went by and the codebase has grown to a point that I cannot stand anymore cutting through the weeds of Java when I have to extend the infrastructure or do bug fixing—not to mention its verbosity even for writing trivial code.

This summer I decided to go on a quest to get me out of the woods. I do not mind relying on the JVM and the large collection of libraries available, but I would also like to get my sanity back. Yes, I tested some of the usual suspects for the JVM (Jython, JRuby, Clojure, and Groovy) but not quite what I wanted. For instance, I wrote most of the Meandre infrastructure services using Jython (much more concise than Java), but still not quite happy to jump on that boat. Clojure is also interesting (functional programming) but it would be hard to justify for the group to move into it since not everybody may feel comfortable with a pure functional language. I also toyed with some not-so-usual ones like Erlang and Haskell, but again, I ended up with no real argument that could justify such a decision.

So, as I started doing back in 2007, I went back to my original idea of using Scala and its mixed object-oriented- and functional-programming- paradigm. To test it seriously, I started developing the distributed execution engine for Meandre in Scala using its Earlang-inspired actors. And, boom, suddenly I found myself spending more time thinking that writing/debugging threaded/networking code :D . Yes, I regret my 2007 decision instead of running with my original intuition, but better late than never. With a working seed of the distributed engine working and tested (did I mention that scalacheck and specs are really powerful tools for behavior driven development?), I finally decided to start gravitating the Meandre infrastructure development effort from Java to Scala—did I mention that Scala is Martin Odersky’s child? Yes, such a decision has some impact on my colleagues, but I envision that the benefits will eventually weight out the initial resistance and step learning curve. At least, the last two group meetings nobody jumped off the window while presenting the key elements of Scala, and demonstrating how concise and elegant it made the first working seed of the distributed execution engine :D . We even got in discussions about the benefits of using Scala if it delivered everything I showed. I am lucky to work with such smart guys. If you want to take a peek at the distributed execution engine (a.k.a. Snowfield) at SEASR’s Fisheye.

Oh, one last thing. Are you using Atlassian’s Fisheye? Do you want syntax highlighting for Scala? I tweaked the Java definitions to make it highlight Scala code. Remember to drop the scala.def file on $FISHEYE_HOME/syntax directory add an entry on the filename.map to make it highlight anything with extension .scala.

Related posts:

  1. Fast REST API prototyping with Crochet and Scala
  2. Meandre: Semantic-Driven Data-Intensive Flow Engine
  3. Meandre Infrastructure 1.4 RC1 tagged