Learning Classifier Systems, Springer LNAI 4998

Learning Classifier Systems:
10th International Workshop, IWLCS 2006, Seattle, MA, USA, July 8, 2006, and 11th International Workshop, IWLCS 2007, London, UK, July 8, 2007, Revised Selected Papers
Series: Lecture Notes in Computer Science
Subseries: Lecture Notes in Artificial Intelligence , Vol. 4998
Bacardit, J.; Bernadó-Mansilla, E.; Butz, M.V.; Kovacs, T.; Llorà, X.; Takadama, K. (Eds.)
2008, X, 307 p., Softcover
ISBN: 978-3-540-88137-7


Abstract

This book constitutes the thoroughly refereed joint post-conference proceedings of two consecutive International Workshops on Learning Classifier Systems that took place in Seattle, WA, USA in July 2006, and in London, UK, in July 2007 – all hosted by the Genetic and Evolutionary Computation Conference, GECCO. The 14 revised full papers presented were carefully reviewed and selected from the workshop contributions. The papers are organized in topical sections on knowledge representation, analysis of the system, mechanisms, new directions, as well as applications.

Usages of R


R has gained a lot of traction on the scientific community for data analysis, modeling, and exploratory work. I just run into a post by Michael E. Driscoll in his Data Evolution blog about how R is used in Google and Facebook. Nothing new, but what got my attention was ParallelR. If you have been […]

R has gained a lot of traction on the scientific community for data analysis, modeling, and exploratory work. I just run into a post by Michael E. Driscoll in his Data Evolution blog about how R is used in Google and Facebook. Nothing new, but what got my attention was ParallelR. If you have been using R for large problems, I am pretty sure you have been wishing that there was some parallelization capabilities. ParallelR targets the problem, and it definitely an option to check out.

GroupTweet or getting groups in Twitter


One of the main handicaps I keep running over and over with Twitter is that there is no concept of groups. The usual story goes along these lines: You get forced to create accounts that behave as groups, make their updates private, and then ask the members to request to follow, and once they follow, […]

One of the main handicaps I keep running over and over with Twitter is that there is no concept of groups. The usual story goes along these lines: You get forced to create accounts that behave as groups, make their updates private, and then ask the members to request to follow, and once they follow, that’s it. Yes, a bit convoluted. Yesterday, I ran into GroupTweet that basically automates this process. It still creates a new user account that works as a group fan but, at least it makes the process easier.

Revamping my Twitter accounts


Since summer 2007 I have been twittering. It started as way to have a conversation with a bunch of friends scattered all over. Since we were not discussing any world-changing topic, my updates have been kept private. Lately, I have been receiving requests to follow me. So, instead of polluting my original intent, @xllora is […]

Since summer 2007 I have been twittering. It started as way to have a conversation with a bunch of friends scattered all over. Since we were not discussing any world-changing topic, my updates have been kept private. Lately, I have been receiving requests to follow me. So, instead of polluting my original intent, @xllora is now my public Twitter account and I moved my previous tweets to @panellet (still not public). Hope this gets things a bit simplified. I still wish that Twitter would one day allow you to have a better access control, but, oh well, it is what it is right now. Also, I added my @xllora tweets below :D

Efficient serialization for Java (and beyond)


I am currently working on the distributed execution of flows as part of the Meandre infrastructure—as a part of the SEASR project. One of the pieces to explore is how to push data between machines. No, I am not going to talk about network protocols and the like here, but how you can pass the […]

I am currently working on the distributed execution of flows as part of the Meandre infrastructure—as a part of the SEASR project. One of the pieces to explore is how to push data between machines. No, I am not going to talk about network protocols and the like here, but how you can pass the data around. If you have ever programmed MPI using C/C++ you remember the tedious efforts that requires passing complex data structures around between processes. Serialization is a way to take those complex structures into a form that can be easily stored/transmitted, and then retrieved/received and regenerate the original complex data structure. Some languages/platforms support this functionality (e.g. Java, Python), allowing to easily use the serialized representation for persistency or transmission purposes.

Last Thursday I was talking to Abhishek Verma, and he pointed out Google’s Protol Buffer project—Google’s take data interchange formats. Not a new idea—for instance Corba’s IDL has been around for a long time—but what caught my eye was their claims about: (1) efficiency, and (2) multiple language bindings. I was contemplating using XStream for Meandre distributed flow execution needs, but the XML heavy weight made me quite reluctant to walk down that path.  The Java native serialization is not a bad choice in terms of efficiency, but does not provide friendly mechanics for modifying data formats without rendering already serialized objects useless, neither a transparent mechanism to allow bindings for other languages/platforms. So the Google’s Protol Buffer seemed an option worth trying. So there I went, and I prepare a simple comparison between the tree: (1) Java serialization, (2) Google’s Protol Buffer, and (3) XStream. Yes, you may guess the outcome, but I was more interested on getting my hands dirty, see how Google’s Protol Buffer perform, and how much overhead for the developer it required.

The experiment

Before getting into the description, this experiment does not try to be an exhaustive performance evaluation, just an afternoon diversion. Having said so, the experiment measured the serialization/deserialization time and space used for a simple data structure containing just one array of integers and one array of strings. All the integers were initialized to zero, and the strings to “Dummy text”. To allow measuring the time required to serialize this simple object, the number of integers and strings were increased incrementally. The code below illustrates the implementation of the Java native serialization measures.

package org.meandre.tools.serialization.xstream;
 
public class TargetObject {
 
       public String [] sa;
       public int [] ia;
 
       public TargetObject ( int iStringElements, int iIntegerElements ) {
             sa = new String[iStringElements];
             for ( int i=0 ; i<iStringElements ; i++ )
                  sa[i] = "Dummy text";
             ia = new int[iIntegerElements];
       }
}

The experiment consisted on generating objects like the above containing from 100 to 10,000 elements by increments of 100. Each object was serialized 50 times, measuring the average serialization time and the space required (in bytes) per object generated. Below you may have the sample code I used to measure native java serialization/deserialization times.

package org.meandre.tools.serialization.java;
 
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
 
import org.junit.Test;
 
public class JavaSerializationTest {
 
       @Test
       public void testJavaSerialization ()
       throws IOException {
             final int MAX_SIZE = 10000;
             final int REP = 50;
             final int INC = 100;
 
             System.out.println("Java serialization times");
             for ( int i=INC ; i<=MAX_SIZE ; i+=INC ) {
                  TargetObjectSerializable tos = new TargetObjectSerializable(i,i);
                  long lAccTime = 0;
                  long lSize = 0;
                  long lTmp;
                  ByteArrayOutputStream baos;
                  ObjectOutputStream out;
                  for ( int j=0 ; j<REP ; j++ ) {
                      baos = new ByteArrayOutputStream();
                      out = new ObjectOutputStream(baos);
                      lTmp = System.currentTimeMillis();
                      out.writeObject(tos);
                      lTmp -= System.currentTimeMillis();
                      out.close();
                      lAccTime -= lTmp;
                      lSize = baos.size();
                  }
                  System.out.println(""+i+"\t"+(((double)lAccTime)/REP)+"\t"+lSize);
             }
       }
 
 
       @Test
       public void testJavaDeserialization ()
       throws IOException, ClassNotFoundException {
             final int MAX_SIZE = 10000;
             final int REP = 50;
             final int INC = 100;
 
             System.out.println("Java deserialization times");
             for ( int i=INC ; i<=MAX_SIZE ; i+=INC ) {
                  TargetObjectSerializable tos = new TargetObjectSerializable(i,i);
                  ByteArrayOutputStream baos = new ByteArrayOutputStream();
                  ObjectOutputStream out = new ObjectOutputStream(baos);
                  out.writeObject(tos);
                  out.close();
                  ByteArrayInputStream bais;
                  ObjectInputStream ois;
                  long lAccTime = 0;
                  long lTmp;
                  for ( int j=0 ; j<REP ; j++ ) {
                      bais = new ByteArrayInputStream(baos.toByteArray());
                      ois = new ObjectInputStream(bais);
                      lTmp = System.currentTimeMillis();
                      ois.readObject();
                      lTmp -= System.currentTimeMillis();
                      lAccTime -= lTmp;
                  }
                  System.out.println(""+i+"\t"+(((double)lAccTime)/REP));
             }
       }
}

Equivalent versions of the code shown above were used to measure Google’s Protol Buffer and XStream. If you are interested on seeing the full code you can download it as it is—no guarantees provided. Also, for completion of the experiment code, you can find below the proto file use for testing the Java implementation of Google’s Protol Buffer.

package test;
 
option java_package = "org.meandre.tools.serialization.proto";
option java_outer_classname = "TargetObjectProtoOuter";
 
message TargetObjectProto {
  repeated int32 ia = 1;
  repeated string sa = 2;
}

In order to run the experiment, besides Google’s Protol Buffer and XStream libraries, you will also need JUnit.

The results

The experiments were run on an first generation MacBook Pro using Apple’s Java 1.5 virtual machine with 2Gb of RAM. The figure below illustrated the different memory requirements for each of the the three serialization methods compared. Figures and data processing was done using R.

Data size of the serialized objectSerialized/original data size ratio

Figures show the already intuited bloated size of XML-based XStream serialization, up to 6 time larger than the original data being serialized. On the other hand, the Java native serialization provides a minimal increase on the serialized equivalent. Google’s Protocol Buffer presents a slightly larger requirement than the native Java serialization, but never doubled the original size. Moreover, it does not exhibit the constant initial payload overhead displayed by both XStream and the Java native serialization. The next question was how costly was the serialization process. Figures below show the amount of time required to serialize an object.

Serialization timeSerialization time ratio

The Java native serialization was, as expected the fastest, however Google’s Protocol Buffer took only, on average, four times the more time than the Java native version. However, that is peanuts when compared to the fifty times slower XStream version. Deserialization times of the encoded object presents the same trends as the serialization, as the figures below show.

Deserialization timeDeserialization time ratio

It is also interesting to note that serialization—as the figures below show—is faster than deserialization (as common sense would have suggested). However, it is interesting to note that Google’s Protocol Buffer is the method where these difference is more pronounced.

Serialization/deserialization ratio

The lessons learned

As I said, this is far from being an exhaustive or even representative example, but just one afternoon exploration. However, the results show interesting trends. Yes, XStream could also be tweaked to make the searialized XML leaner, and even would—with the proper tinkering—make possible deserialize the object on a different platform/language, but at an enormous cost—both in size and time. The Java native serialization is by far the fastest and the most size efficient, but is made from and for Java. Also, changes on the serialized classes—imagine wanting to add or remove a field—may render the serialize objects unreadable. Google Protocol Buffers on the other hand delivers the best of both scenarios: (1) the ability to serialize/deserialize objects in a compact and relatively fast manner, and (2) allows the serialization/deserialization to happen between different languages and platforms. For these reasons, it seems to be a very interesting option to keep exploring, if you need both.

Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

by Llorà, X.
IlliGAL technical report 2009001. You can download the pdf here. More information is also available at the Meandre website as part of the SEASR project.
Abstract: Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing […]

by Llorà, X.

IlliGAL technical report 2009001. You can download the pdf here. More information is also available at the Meandre website as part of the SEASR project.

Abstract: Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases—selectorecombinative genetic algorithms and estimation of distribution algorithms—are presented, analyzed, discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.

Need a quick form on your WordPress site?


cForms is a WordPress plugin that allows you to quickly add forms on your site. The plugin will allow you to collect information from visitors easily. Besides mail notifications, it also provides database back-ended tracking. Pretty convenient if you have to boost the usefulness of your site. Two examples I have more or less been involved is […]

cForms is a WordPress plugin that allows you to quickly add forms on your site. The plugin will allow you to collect information from visitors easily. Besides mail notifications, it also provides database back-ended tracking. Pretty convenient if you have to boost the usefulness of your site. Two examples I have more or less been involved is the SEASR 2009 workshop registration form and the iFoundry application form. On both cases, cForms worked like a champ.

FireStats: Statistics on fire for WordPress sites


I have been using FireStats for gathering statistics on WordPress sites for more than a couple of years now. I mainly use FireStats combined with WordPress Stats, and Google Analytics. Each of them give you different views into traffic, but FireStats is by far quick and fast and give you a good overall picture you can dig down using WordPress […]

I have been using FireStats for gathering statistics on WordPress sites for more than a couple of years now. I mainly use FireStats combined with WordPress Stats, and Google Analytics. Each of them give you different views into traffic, but FireStats is by far quick and fast and give you a good overall picture you can dig down using WordPress Stats and Google Analytics. I just installed the new version 1.6.0 on this blog and found a new interesting goodie. Now FireStats also tracks the number of RSS readers coming to your site :D

Protect yourself from genetic algorithms’ surprises


I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or […]

I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or you may end up getting an unexpected all-terrain “pencil”. Yes, I know the example is quite solution free, but still has some truth to it. How many times have you end up getting something you did not expect, only because evolution find a better fitted crack in your description? Anyway, it is fun what you end running into on the Internet ;)

 

Protect yourself from genetic algorithms’ surprises


I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or […]

I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or you may end up getting an unexpected all-terrain “pencil”. Yes, I know the example is quite solution free, but still has some truth to it. How many times have you end up getting something you did not expect, only because evolution find a better fitted crack in your description? Anyway, it is fun what you end running into on the Internet ;)

Â