ICPR 2010 – Contest

Classifier domains of competence: The landscape contest is a research competition aimed at finding out the relation between data complexity and the performance of learners. Comparing your techniques to those […]

Classifier domains of competence: The landscape contest is a research competition aimed at finding out the relation between data complexity and the performance of learners. Comparing your techniques to those of other participants may contribute to enrich our understanding of the behavior of machine learning and open further research lines. Contest participants are allowed to use any type of technique. However, we highly encourage and appreciate the use of novel algorithms.

The contest will take place on August 22, during the 20th International Conference on Pattern Recognition (ICPR 2010) at Istanbul, Turkey.

We are planning to have a day workshop during the ICPR 2010, so that participants will be able to present and discuss their results.

We encourage everyone to participate and share with us your work! For further details about dates and submission, please visit The landscape contest webpage.

Soaring the Clouds with Meandre

You may find the slide deck and the abstract for the presentation we delivered today at the “Data-Intensive Research: how should we improve our ability to use data” workshop in Edinburgh. Abstract This talk will focus a highly scalable data intensive infrastructure being developed at the National Center for Supercomputing Application (NCSA) at the University […]

Related posts:

  1. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
  2. Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre
  3. [BDCSG2008] Clouds and ManyCores: The Revolution (Dan Reed)

You may find the slide deck and the abstract for the presentation we delivered today at the “Data-Intensive Research: how should we improve our ability to use data” workshop in Edinburgh.

Abstract

This talk will focus a highly scalable data intensive infrastructure being developed at the National Center for Supercomputing Application (NCSA) at the University of Illinois and will introduce current research efforts to tackle the challenges presented by big-data. Research efforts include exploring potential ways of integration between cloud computing concepts—such as Hadoop or Meandre—and traditional HPC technologies and assets. These architecture models contrast significantly, but can be leveraged by building cloud conduits that connect these resources to provide even greater flexibility and scalability on demand. Orchestrating the physical computational environment requires innovative and sophisticated software infrastructure that can transparently take advantage of the functional features and to negotiate the constraints imposed by this diversity of computational resources. Research conducted during the development of the Meandre infrastructure has lead to the production of an agile conductor able to leverage the particular advantages in the physical diversity. It can also be implemented as services and/or in the context of another application benefitting from it reusability, flexibility, and high-scalability. Some example applications and an introduction to the data intensive infrastructure architecture will be presented to provide an overview of the diverse scope of Meandre usages. Finally, a case will be presented showing how software developers and system designers can easily transition to these new paradigms to address the primary data-deluge challenges and to soar to new heights with extreme application scalability using cloud computing concepts.

Related posts:

  1. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
  2. Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre
  3. [BDCSG2008] Clouds and ManyCores: The Revolution (Dan Reed)

GECCO 2010 Submission Deadline (Extended)

If you are planning to submit a paper for the 2010 Genetic and Evolutionary Computation Conference, the deadline is January 13, 2010 (and now extended to January 27th). You can find more information at the GECCO 2010 calendar site. Related posts:GECCO 2009 paper submission deadline extended till January 28 GECCO 2007 deadline extended GECCO-2006 submissions […]

Related posts:

  1. GECCO 2009 paper submission deadline extended till January 28
  2. GECCO 2007 deadline extended
  3. GECCO-2006 submissions deadline extended to February 1st

If you are planning to submit a paper for the 2010 Genetic and Evolutionary Computation Conference, the deadline is January 13, 2010 (and now extended to January 27th). You can find more information at the GECCO 2010 calendar site.

Related posts:

  1. GECCO 2009 paper submission deadline extended till January 28
  2. GECCO 2007 deadline extended
  3. GECCO-2006 submissions deadline extended to February 1st

Scaling Genetic Algorithms using MapReduce

Below you may find the abstract to and the link to the technical report of the paper entitled “Scaling Genetic Algorithms using MapReduce” that will be presented at the Ninth International Conference on Intelligent Systems Design and Applications (ISDA) 2009 by Verma, A., Llorà, X., Campbell, R.H., Goldberg, D.E. next month. Abstract:Genetic algorithms(GAs) are increasingly […]

Related posts:

  1. Scaling eCGA Model Building via Data-Intensive Computing
  2. Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre
  3. Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

Below you may find the abstract to and the link to the technical report of the paper entitled “Scaling Genetic Algorithms using MapReduce” that will be presented at the Ninth International Conference on Intelligent Systems Design and Applications (ISDA) 2009 by Verma, A., Llorà, X., Campbell, R.H., Goldberg, D.E. next month.

Abstract:Genetic algorithms(GAs) are increasingly being applied to large scale problems. The traditional MPI-based parallel GAs do not scale very well. MapReduce is a powerful abstraction developed by Google for making scalable and fault tolerant applications. In this paper, we mould genetic algorithms into the the MapReduce model. We describe the algorithm design and implementation of GAs on Hadoop, the open source implementation of MapReduce. Our experiments demonstrate the convergence and scalability upto 105 variable problems. Adding more resources would enable us to solve even larger problems without any changes in the algorithms and implementation.

The draft of the paper can be downloaded as IlliGAL TR. No. 2009007. For more information see the IlliGAL technical reports web site.

Related posts:

  1. Scaling eCGA Model Building via Data-Intensive Computing
  2. Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre
  3. Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

Facetwise analysis of XCS for problems with class imbalances

by Albert Orriols-Puig, Ester Bernadó-Mansilla, David E. Goldberg, Kumara Sastry, and Pier Luca Lanzi. IEEE Transactions on Evolutionary Computation, doi=10.1109/ TEVC.2009.2019829, [Publisher site].

Michigan-style learning classifier systems (LCSs) are online machine learning techniques that incrementally evolve distributed subsolutions which individually solve a portion of the problem space. As in many machine learning systems, extracting accurate models […]

by Albert Orriols-Puig, Ester Bernadó-Mansilla, David E. Goldberg, Kumara Sastry, and Pier Luca Lanzi. IEEE Transactions on Evolutionary Computation, doi=10.1109/ TEVC.2009.2019829, [Publisher site].

Michigan-style learning classifier systems (LCSs) are online machine learning techniques that incrementally evolve distributed subsolutions which individually solve a portion of the problem space. As in many machine learning systems, extracting accurate models from problems with class imbalances—that is, problems in which one of the classes is poorly represented with respect to the other classes—has been identified as a key challenge to LCSs. Empirical studies have shown that Michiganstyle LCSs fail to provide accurate subsolutions that represent the minority class in domains with moderate and large disproportion of examples per class; however, the causes of this failure have not been analyzed in detail. Therefore, the aim of this paper is to carefully examine the effect of class imbalances on different LCS components. The analysis focuses on XCS, which is the most-relevant Michigan-style LCS, although the models could be easily adapted to other LCSs. Design decomposition is used to identify five elements that are crucial to guaranteeing the success of LCSs in domains with class imbalances, and facetwise models that explain these different elements for XCS are developed. All theoretical models are validated with artificial problems. The integration of all these models enables us to identify the sweet spot where XCS is able to scalably and efficiently evolve accurate models of rare classes; furthermore, facetwise analysis is used as a tool for designing a set of configuration guidelines that have to be followed to ensure convergence. When properly configured, XCS is shown to be able to solve highly unbalanced problems that previously eluded solution.

Liquid: RDF meandering in FluidDB

Meandre (NCSA pushed data-intensive computing infrastructure) relies on RDF to describe components, flows, locations and repositories. RDF has become the central piece that makes possible Meandre’s flexibility and reusability. However, one piece still remains largely sketchy and still has no clear optimal solution: How can we facilitate to anybody sharing, publishing and annotating flows, components, […]

Related posts:

  1. Liquid: RDF endpoint for FluidDB
  2. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
  3. Meandre 1.4.0 final release candidate tagged

Meandre (NCSA pushed data-intensive computing infrastructure) relies on RDF to describe components, flows, locations and repositories. RDF has become the central piece that makes possible Meandre’s flexibility and reusability. However, one piece still remains largely sketchy and still has no clear optimal solution: How can we facilitate to anybody sharing, publishing and annotating flows, components, locations and repositories? More importantly, how can that be done in the cloud in an open-ended fashion and allow anybody to annotate and comment on each of the afore mentioned pieces?

The FluidDB trip

During my last summer trip to Europe, Terry Jones (CEO) invited me to visit FluidInfo (based in Barcelona) where I also meet Esteve Fernandez (CTO). I had a great opportunity to chat with the masterminds behind an intriguing concept I ran into after a short note I received from David E. Goldberg. FluidDB, the main product being pushed by FluidInfo, is an online collaborative “cloud” database. On FluidInfo words:

FluidDB lets data be social. It allows almost unlimited information personalization by individual users and applications, and also between them. This makes it simple to build a wide variety of applications that benefit from cooperation, and which are open to unanticipated future enhancements. Even more importantly, FluidDB facilitates and encourages the growth of applications that leave users in control of their own data.

FluidDB went live on a private alpha last week. The basic concept behind the scenes is simple. FluidDB stores objects. Objects do not belong to anybody. Objects may be “blank” or they may be about something (e.g. http://seasr.org/meandre). You can create as many blank objects as you want. Creating an object with the same about always returns the same object (thus, there will only be one object about http://seasr.org/meandre). Once objects exists, things start getting more interesting, you can go and tag any object with whatever tag you want. For instance I could tag the http://seasr.org/meandre object hosted_by tag, and assign the tag the value FluidDB introduces one last trick: namespaces. For instance, I got xllora. that means that the above tag I mentioned would look like /tag/xllora/hosted_by. You can create as many nested namespaces under your main namespace as you want. FluidDB also provides mechanisms to control who can query and see the values of your created tags.

As you can see, the basic object model and mechanics is very simple. When the alpha went live, FluidDB only provide access via a simple REST-like HTTP API. In a few days a blossom of client libraries that wrap that API were develop by a dynamic community that gather on #fluiddb channel on irc.freenode.net where FluidDB

You were saying something about RDF

Back to the point. One thing I chatted with the FluidDB guys was what did they think about the similarities between FluidDB’s object model and RDF. After playing with RDF for a while, the FluidDB model look awfully familiar, despite a much simplified and manageable model than RDF. They did not have much to say about it, and the question got stuck in the back of my mind. So when I got access to the private alpha, I could not help it but get down the path of what would it mean to map RDF on FluidDB. Yes, the simple straight answer would be to stick serialized RDF into the value of a given tag (e.g. xllora/rdf). However, that option seemed poor, since I could not exploit the social aspect of collaborative annotations provided by FluidDB. So back to the drawing board. What both models have in common: They are both descriptions about something. In RDF you can see those as the subjects of the triple predicates, whereas in FluidDB those are simple objects. RDF use properties to qualify objects. FluidDB uses tags. Both enable you to add value to qualified objects. Mmh, there you go.

With this idea in mind, I started Liquid, a simple proof-of-concept library that maps RDF on to FluidDB and then it gets it back. There was only one thing that needed a bit of patching. RDF properties are arbitrary URIs. Those could not be easily map on the top of FluidDB tags, so I took a simple compromise route.

  • RDFs subject URIs are mapped onto FluidDB qualified objects via the about tag
  • One FluidDB tag will contain all the properties for that object (basically a simple dictionary encoded in JSON)
  • Reference to other RDF URIs will be mapped on to FluidDB object URIs, and vice versa

Let’s make it a bit more chewable with a simple example.

<?xml version="1.0"?>
 
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
 
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
  <cd:artist>Bob Dylan</cd:artist>
 </rdf:Description>
 
</rdf:RDF>

The above RDF represents a single triple

http://www.recshop.fake/cd/Empire Burlesque	http://www.recshop.fake/cd#artist	   "Bob Dylan"

This triple could be map onto FluidDB by creating one qualified FluidDB object and adding the proper tags. The example below shows how to do so using Python’s fdb.py client library by Nicholas J. Radcliffe.

import fdb,sys
if sys.version_info < (2, 6):
    import simplejson as json
else:
    import json
 
__RDF_TAG__ = 'rdf'
__RDF_TAG_PROPERTIES__  = 'rdf_properties'
__RDF_TAG_MODEL_NAME__ = 'rdf_model_name'
 
#
# Initialize the FluidDB client library
#
f = fdb.FluidDB()
#
# Create the tags (if they exist, this won't hurt)
#
f.create_abstract_tag(__RDF_TAG__)
f.create_abstract_tag(__RDF_TAG_PROPERTIES__)
f.create_abstract_tag(__RDF_TAG_MODEL_NAME__)
#
# Create the subject object of the triple
#	
o = f.create_object('http://www.recshop.fake/cd/Empire Burlesque')
#
# Map RDF properties
#
properties = {'http://www.recshop.fake/cd#artist':['Bob Dylan']}
#
# Tag the object as RDF aware, properties available, and to which model/named graph 
# it belongs
#
f.tag_object_by_id(o.id, __RDF_TAG__)
f.tag_object_by_id(o.id,__RDF_TAG_PROPERTIES__,value=json.dumps(properties))
f.tag_object_by_id(o.id, __RDF_TAG_MODEL_NAME__,'test_dummy')

Running along with this basic idea, I quickly stitched a simple library (Liquid) that allows ingestion and retrieval of RDF from FluidDB. It is still very rudimentary and may not totally map properly all possible RDF, but it is a working proof-of-concept implementation that it is possible to do so.

The Python code above just saves a triple. You can easy retrieve the triple by performing the following operation

import fdb,sys
if sys.version_info < (2, 6):
    import simplejson as json
else:
    import json
 
__RDF_TAG__ = 'rdf'
__RDF_TAG_PROPERTIES__  = 'rdf_properties'
__RDF_TAG_MODEL_NAME__ = 'rdf_model_name'
 
#
# Initialize the FluidDB client library
#
f = fdb.FluidDB()
#
# Retrieve the annotated objects
#
objs = f.query('has xllora/%s'%(__RDF_TAG__))
#
# Optionally you could retrieve the ones only belonging to a given model by
#
# objs = fdb.query('has xllora/%s and xllora/%s matches "%s"'%(__RDF_TAG__,__RDF_TAG_MODEL_NAME__,modelname))
#
subs = [f.get_tag_value_by_id(s,'/tags/fluiddb/about') for s in objs]
props_tmp = [f.get_tag_value_by_id(s,'/tags/xllora/'+__RDF_TAG_PROPERTIES__) for s in objs]
props = [json.loads(s[1]) if s[0]==200 else {} for s in props_tmp]

Now subs contains all the subject URIs for the predicates, and props all the dictionaries containing the properties.

The bottom line

OK. So, what is this mapping important? Basically, it will allow collaborative tagging of the created objects (subjects), allowing a collaborative and social gathering of information, besides them mapped RDF. So, what does it all means?

It basically means, that if you do not have the need to ingest RDF (where property URIs are not directly map and you need to Fluidify/reify), any data stored in FluidDB is already on some form of triplified RDF. Let me explain what I mean by that. Each FluidDB has a unique URI (e.g. http://fluidDB.fluidinfo.com/objects/4fdf7ff4-f0da-4441-8e63-9b98ed26fc12). Each tag is also uniquely identified by an URI (e.g. http://fluidDB.fluidinfo.com/tags/xllora/rdf_model_name). And finally each pair object/tag may have a value (e.g. a literal 'test_dummy' or maybe another URI http://fluidDB.fluidinfo.com/objects/a0dda173-9ee0-4799-a507-8710045d2b07). If a object/tag does not have a value you can just point it to the no value URI (or some other convention you like).

Having said that, now you have all the pieces to express FluidDB data in plain shareable RDF. That would mean basically get all the tags for and object, query the values, and then just generate and RDF model by adding the gathered triples. That’s easy. Also, if you align your properties to tags, the ingestion would also become that trivial. I will try to get that piece into Liquid as soon as other issues allow me to do so :D .

Just to close, I would mention once again a key element of this picture. FluidDB opens the door to a truly cooperative, distributed, and online fluid semantic web. It is one of the first examples of how annotations (a.k.a. metadata) can be easily gathered and used on the “cloud” for the masses. Great job guys!

Related posts:

  1. Liquid: RDF endpoint for FluidDB
  2. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
  3. Meandre 1.4.0 final release candidate tagged

AI: Reality or fiction?

It seems that the artificial intelligence related in science fiction is not as far from reality as we used to think. The main character of the film AI, a little boy belonging to a robot series capable of emulating human behavior, is now a model to reach in current scientific projects, which aim at providing […]

It seems that the artificial intelligence related in science fiction is not as far from reality as we used to think. The main character of the film AI, a little boy belonging to a robot series capable of emulating human behavior, is now a model to reach in current scientific projects, which aim at providing machines with consciousness, thoughts, and emotions to interact with human beings. Thus, the world described in Blade Runner, a world where humans and robots coexist and cannot be distinguished with the naked eye, may be just behind the corner.

The advances in AI field, however, start to raise some serious concerns about robot autonomy and its social status as well as how to face this social disruption, and the three Laws elaborated by Asimov to protect humans from machines start to make sense for other than computer geeks. Scientifics are concerned about the “loss of human control of computer-based intelligences”, and the past February, the Association for the Advancement of Artificial Intelligence organized a conference in Asilomar (not a casual place) to discuss the limits of the research in this field. Development of machines that are close to kill autonomously are worth a discussion by those involved in the creation of the brain of such devices. The news of this event has leaked in the Markoff’s article in the New York Times.

On the other hand, who will be responsible for damages caused by these autonomous friends? Themselves or the corresponding designer? In this sense, philosophy should play a leading role in the design and integration of these “future citizens” since they should have a moral system allowing them to learn ethics from experience and people, and also find their place in our society. The latter implies to create a legal framework that defines machine’s civic rights and duties which is a proposal under study (see the news published by “El Periódico”, in Spanish language).

Finally, one may ask whether or not we are ready to live with human emulators. In my view, we are not. Although in the past years we have been skillful to adapt to new and challenging situations, and our experience with immigration integration and race conflicts should help us to welcome these new electronic neighbors, I tend to think that coexistence with robots will be one of the greatest challenges mankind has ever faced. Anyway, we will need to figure out the way to overcome it because the individualism and loneliness ruling our current society is leading us unrelentingly to a future with custom-made roommates.

AI: Reality or fiction?

It seems that the artificial intelligence related in science fiction is not as far from reality as we used to think. The main character of the film AI, a little […]

It seems that the artificial intelligence related in science fiction is not as far from reality as we used to think. The main character of the film AI, a little boy belonging to a robot series capable of emulating human behavior, is now a model to reach in current scientific projects, which aim at providing machines with consciousness, thoughts, and emotions to interact with human beings. Thus, the world described in Blade Runner, a world where humans and robots coexist and cannot be distinguished with the naked eye, may be just behind the corner.

The advances in AI field, however, start to raise some serious concerns about robot autonomy and its social status as well as how to face this social disruption, and the three Laws elaborated by Asimov to protect humans from machines start to make sense for other than computer geeks. Scientifics are concerned about the “loss of human control of computer-based intelligences”, and the past February, the Association for the Advancement of Artificial Intelligence organized a conference in Asilomar (not a casual place) to discuss the limits of the research in this field. Development of machines that are close to kill autonomously are worth a discussion by those involved in the creation of the brain of such devices. The news of this event has leaked in the Markoff’s article in the New York Times.

On the other hand, who will be responsible for damages caused by these autonomous friends? Themselves or the corresponding designer? In this sense, philosophy should play a leading role in the design and integration of these “future citizens” since they should have a moral system allowing them to learn ethics from experience and people, and also find their place in our society. The latter implies to create a legal framework that defines machine’s civic rights and duties which is a proposal under study (see the news published by “El Periódico”, in Spanish language).

Finally, one may ask whether or not we are ready to live with human emulators. In my view, we are not. Although in the past years we have been skillful to adapt to new and challenging situations, and our experience with immigration integration and race conflicts should help us to welcome these new electronic neighbors, I tend to think that coexistence with robots will be one of the greatest challenges mankind has ever faced. Anyway, we will need to figure out the way to overcome it because the individualism and loneliness ruling our current society is leading us unrelentingly to a future with custom-made roommates.

GECCO 2009: A binary pre-teenager

GECCO, one of the most relevant conferences on evolutionary computation, starts its 10th edition today in Montréal (Canada). The organization committee has prepared a lot of surprises within a tight […]

GECCO, one of the most relevant conferences on evolutionary computation, starts its 10th edition today in Montréal (Canada). The organization committee has prepared a lot of surprises within a tight agenda. From July 8 to July 12, full days of tutorials, workshops, poster sessions, talks, competitions, awards, the birthday, and the star talk by John H. Holland will be a promising immersion into the emergent world of evolutionary computation. I hope all of them give rise to the “Chronicles of GECCO”.

For further information, please see the program.

GECCO 2009: A binary pre-teenager

GECCO, one of the most relevant conferences on evolutionary computation, starts its 10th edition today in Montréal (Canada). The organization committee has prepared a lot of surprises within a tight agenda. From July 8 to July 12, full days of tutorials, workshops, poster sessions, talks, competitions, awards, the birthday, and the star talk by John […]

GECCO, one of the most relevant conferences on evolutionary computation, starts its 10th edition today in Montréal (Canada). The organization committee has prepared a lot of surprises within a tight agenda. From July 8 to July 12, full days of tutorials, workshops, poster sessions, talks, competitions, awards, the birthday, and the star talk by John H. Holland will be a promising immersion into the emergent world of evolutionary computation. I hope all of them give rise to the “Chronicles of GECCO”.

For further information, please see the program.