Liquid: RDF meandering in FluidDB

Meandre (NCSA pushed data-intensive computing infrastructure) relies on RDF to describe components, flows, locations and repositories. RDF has become the central piece that makes possible Meandre’s flexibility and reusability. However, one piece still remains largely sketchy and still has no clear optimal solution: How can we facilitate to anybody sharing, publishing and annotating flows, components, […]

Related posts:

  1. Liquid: RDF endpoint for FluidDB
  2. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
  3. Meandre 1.4.0 final release candidate tagged

Meandre (NCSA pushed data-intensive computing infrastructure) relies on RDF to describe components, flows, locations and repositories. RDF has become the central piece that makes possible Meandre’s flexibility and reusability. However, one piece still remains largely sketchy and still has no clear optimal solution: How can we facilitate to anybody sharing, publishing and annotating flows, components, locations and repositories? More importantly, how can that be done in the cloud in an open-ended fashion and allow anybody to annotate and comment on each of the afore mentioned pieces?

The FluidDB trip

During my last summer trip to Europe, Terry Jones (CEO) invited me to visit FluidInfo (based in Barcelona) where I also meet Esteve Fernandez (CTO). I had a great opportunity to chat with the masterminds behind an intriguing concept I ran into after a short note I received from David E. Goldberg. FluidDB, the main product being pushed by FluidInfo, is an online collaborative “cloud” database. On FluidInfo words:

FluidDB lets data be social. It allows almost unlimited information personalization by individual users and applications, and also between them. This makes it simple to build a wide variety of applications that benefit from cooperation, and which are open to unanticipated future enhancements. Even more importantly, FluidDB facilitates and encourages the growth of applications that leave users in control of their own data.

FluidDB went live on a private alpha last week. The basic concept behind the scenes is simple. FluidDB stores objects. Objects do not belong to anybody. Objects may be “blank” or they may be about something (e.g. http://seasr.org/meandre). You can create as many blank objects as you want. Creating an object with the same about always returns the same object (thus, there will only be one object about http://seasr.org/meandre). Once objects exists, things start getting more interesting, you can go and tag any object with whatever tag you want. For instance I could tag the http://seasr.org/meandre object hosted_by tag, and assign the tag the value FluidDB introduces one last trick: namespaces. For instance, I got xllora. that means that the above tag I mentioned would look like /tag/xllora/hosted_by. You can create as many nested namespaces under your main namespace as you want. FluidDB also provides mechanisms to control who can query and see the values of your created tags.

As you can see, the basic object model and mechanics is very simple. When the alpha went live, FluidDB only provide access via a simple REST-like HTTP API. In a few days a blossom of client libraries that wrap that API were develop by a dynamic community that gather on #fluiddb channel on irc.freenode.net where FluidDB

You were saying something about RDF

Back to the point. One thing I chatted with the FluidDB guys was what did they think about the similarities between FluidDB’s object model and RDF. After playing with RDF for a while, the FluidDB model look awfully familiar, despite a much simplified and manageable model than RDF. They did not have much to say about it, and the question got stuck in the back of my mind. So when I got access to the private alpha, I could not help it but get down the path of what would it mean to map RDF on FluidDB. Yes, the simple straight answer would be to stick serialized RDF into the value of a given tag (e.g. xllora/rdf). However, that option seemed poor, since I could not exploit the social aspect of collaborative annotations provided by FluidDB. So back to the drawing board. What both models have in common: They are both descriptions about something. In RDF you can see those as the subjects of the triple predicates, whereas in FluidDB those are simple objects. RDF use properties to qualify objects. FluidDB uses tags. Both enable you to add value to qualified objects. Mmh, there you go.

With this idea in mind, I started Liquid, a simple proof-of-concept library that maps RDF on to FluidDB and then it gets it back. There was only one thing that needed a bit of patching. RDF properties are arbitrary URIs. Those could not be easily map on the top of FluidDB tags, so I took a simple compromise route.

  • RDFs subject URIs are mapped onto FluidDB qualified objects via the about tag
  • One FluidDB tag will contain all the properties for that object (basically a simple dictionary encoded in JSON)
  • Reference to other RDF URIs will be mapped on to FluidDB object URIs, and vice versa

Let’s make it a bit more chewable with a simple example.

<?xml version="1.0"?>
 
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
 
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
  <cd:artist>Bob Dylan</cd:artist>
 </rdf:Description>
 
</rdf:RDF>

The above RDF represents a single triple

http://www.recshop.fake/cd/Empire Burlesque	http://www.recshop.fake/cd#artist	   "Bob Dylan"

This triple could be map onto FluidDB by creating one qualified FluidDB object and adding the proper tags. The example below shows how to do so using Python’s fdb.py client library by Nicholas J. Radcliffe.

import fdb,sys
if sys.version_info < (2, 6):
    import simplejson as json
else:
    import json
 
__RDF_TAG__ = 'rdf'
__RDF_TAG_PROPERTIES__  = 'rdf_properties'
__RDF_TAG_MODEL_NAME__ = 'rdf_model_name'
 
#
# Initialize the FluidDB client library
#
f = fdb.FluidDB()
#
# Create the tags (if they exist, this won't hurt)
#
f.create_abstract_tag(__RDF_TAG__)
f.create_abstract_tag(__RDF_TAG_PROPERTIES__)
f.create_abstract_tag(__RDF_TAG_MODEL_NAME__)
#
# Create the subject object of the triple
#	
o = f.create_object('http://www.recshop.fake/cd/Empire Burlesque')
#
# Map RDF properties
#
properties = {'http://www.recshop.fake/cd#artist':['Bob Dylan']}
#
# Tag the object as RDF aware, properties available, and to which model/named graph 
# it belongs
#
f.tag_object_by_id(o.id, __RDF_TAG__)
f.tag_object_by_id(o.id,__RDF_TAG_PROPERTIES__,value=json.dumps(properties))
f.tag_object_by_id(o.id, __RDF_TAG_MODEL_NAME__,'test_dummy')

Running along with this basic idea, I quickly stitched a simple library (Liquid) that allows ingestion and retrieval of RDF from FluidDB. It is still very rudimentary and may not totally map properly all possible RDF, but it is a working proof-of-concept implementation that it is possible to do so.

The Python code above just saves a triple. You can easy retrieve the triple by performing the following operation

import fdb,sys
if sys.version_info < (2, 6):
    import simplejson as json
else:
    import json
 
__RDF_TAG__ = 'rdf'
__RDF_TAG_PROPERTIES__  = 'rdf_properties'
__RDF_TAG_MODEL_NAME__ = 'rdf_model_name'
 
#
# Initialize the FluidDB client library
#
f = fdb.FluidDB()
#
# Retrieve the annotated objects
#
objs = f.query('has xllora/%s'%(__RDF_TAG__))
#
# Optionally you could retrieve the ones only belonging to a given model by
#
# objs = fdb.query('has xllora/%s and xllora/%s matches "%s"'%(__RDF_TAG__,__RDF_TAG_MODEL_NAME__,modelname))
#
subs = [f.get_tag_value_by_id(s,'/tags/fluiddb/about') for s in objs]
props_tmp = [f.get_tag_value_by_id(s,'/tags/xllora/'+__RDF_TAG_PROPERTIES__) for s in objs]
props = [json.loads(s[1]) if s[0]==200 else {} for s in props_tmp]

Now subs contains all the subject URIs for the predicates, and props all the dictionaries containing the properties.

The bottom line

OK. So, what is this mapping important? Basically, it will allow collaborative tagging of the created objects (subjects), allowing a collaborative and social gathering of information, besides them mapped RDF. So, what does it all means?

It basically means, that if you do not have the need to ingest RDF (where property URIs are not directly map and you need to Fluidify/reify), any data stored in FluidDB is already on some form of triplified RDF. Let me explain what I mean by that. Each FluidDB has a unique URI (e.g. http://fluidDB.fluidinfo.com/objects/4fdf7ff4-f0da-4441-8e63-9b98ed26fc12). Each tag is also uniquely identified by an URI (e.g. http://fluidDB.fluidinfo.com/tags/xllora/rdf_model_name). And finally each pair object/tag may have a value (e.g. a literal 'test_dummy' or maybe another URI http://fluidDB.fluidinfo.com/objects/a0dda173-9ee0-4799-a507-8710045d2b07). If a object/tag does not have a value you can just point it to the no value URI (or some other convention you like).

Having said that, now you have all the pieces to express FluidDB data in plain shareable RDF. That would mean basically get all the tags for and object, query the values, and then just generate and RDF model by adding the gathered triples. That’s easy. Also, if you align your properties to tags, the ingestion would also become that trivial. I will try to get that piece into Liquid as soon as other issues allow me to do so :D .

Just to close, I would mention once again a key element of this picture. FluidDB opens the door to a truly cooperative, distributed, and online fluid semantic web. It is one of the first examples of how annotations (a.k.a. metadata) can be easily gathered and used on the “cloud” for the masses. Great job guys!

Related posts:

  1. Liquid: RDF endpoint for FluidDB
  2. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
  3. Meandre 1.4.0 final release candidate tagged