NIGEL 2006 revisited (Part I): Wilson and Goldberg

I finally finished transcoding the videos from NIGEL 2006 and started uploading them to Vimeo. Every week I will upload two of them following NIGEL 2006 agenda. I will also embed the slides that are already available on SlideShare, if available for the talk. Enjoy this first release, Wilson vs. Goldberg. I have also included the meeting introduction just for nostalgia purposes.

Introduction

Video
[vimeo clip_id=4479633 width=”432″ height=”320″]

Slides
[slideshare id=1384574&doc=nigel-2006-llora-welcome-090504152827-phpapp02]

Stewart Wilson

Video
[vimeo clip_id=4478921 width=”432″ height=”320″]

Slides
Unfortunately, Stewart’s slides are not available.

David E. Goldberg

Video
[vimeo clip_id=4477260 width=”432″ height=”320″]

Slides
[slideshare id=1384594&doc=nigel-2006-goldberg-090504153108-phpapp02]

Transcoding NIGEL 2006 videos

Last week Pier Luca Lanzi was visiting IlliGAL. Yesterday, before he left for Chicago, we went for one last brunch.  He mentioned that he liked a lot the videos we shot during NIGEL 2006. Thinking about it we agreed would be useful to recover the videos and upload them into some of the usual video […]

Related posts:

  1. NIGEL 2006 Part VI: Bacardit
  2. NIGEL 2006 Part V: Bernardó vs. Lanzi
  3. NIGEL 2006 Part IV: Llorà vs. Casillas

Last week Pier Luca Lanzi was visiting IlliGAL. Yesterday, before he left for Chicago, we went for one last brunch.  He mentioned that he liked a lot the videos we shot during NIGEL 2006. Thinking about it we agreed would be useful to recover the videos and upload them into some of the usual video sharing site suspects. Currently they are hosted, for long term storage purposes, at NCSA’s web archive. I spent sometime retrieving them from the archive (they are pretty fat and encoded in wmv) and I stated transcoding it in m4a. My plan? Make them available via Vimeo and LCS & GBML Central. Also, I will be uploading the presentation slides to SlideShare and also make them available via LCS & GBML Central.

Update: The first two videos (Wilson and Goldberg) are already available at LCS & GBML Central.

Related posts:

  1. NIGEL 2006 Part VI: Bacardit
  2. NIGEL 2006 Part V: Bernardó vs. Lanzi
  3. NIGEL 2006 Part IV: Llorà vs. Casillas

IWLCS 2009 warming up

The Twelfth International Workshop on Learning Classifier Systems (IWLCS 2009) will be held in Montreal, Canada, Thursday, July 9, 2008 during the Genetic and Evolutionary Computation Conference (GECCO-2009), July 8-12, 2009.

Originally, Learning Classifier Systems (LCSs) were introduced by John H. Holland as a way of applying evolutionary computation to machine learning and adaptive behavior problems. Since then, the LCS paradigm has broadened greatly into a framework that encompasses many representations, rule discovery mechanisms, and credit assignment schemes. Current LCS applications range from data mining, to automated innovation and the on-line control of cognitive systems. LCS research includes various actual system approaches: While Wilson’s accuracy-based XCS system (1995) has received the highest attention and gained the highest reputation, studies and developments of other LCSs are usually discussed and contrasted. Advances in machine learning, and reinforcement learning in particular, as well as in evolutionary computation have brought LCS systems the necessary competence and guaranteed learning properties. Novel insights in machine learning and evolutionary computation are being integrated into the LCS framework. Thus, we invite submissions that discuss recent developments in all areas of research on, and applications of, Learning Classifier Systems. IWLCS is the event that brings together most of the core researchers in classifier systems. Moreover, a free introductory tutorial on LCSs is presented the day before the workshop at GECCO 2009. Tutorial and IWLCS workshop thus also provide an opportunity for researchers interested in LCSs to get an impression of the current research directions in the field as well as a guideline for the application of LCSs to their problem domain.

More information can be found in the original call for papers.

Deadline extended for special issue on Metaheuristics for Large Scale Data Mining

The deadline for the special issue on Metaheuristics for Large Scale Data Mining to be published by Springer’s Memetic Computing Journal has been extended till May 31, 2009. More information can be found in this post at LCS & GBML Central.

Related posts:[BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)Special issue on chance discovery […]

Related posts:

  1. [BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)
  2. Special issue on chance discovery (I)
  3. GECCO 2009 paper submission deadline extended till January 28

The deadline for the special issue on Metaheuristics for Large Scale Data Mining to be published by Springer’s Memetic Computing Journal has been extended till May 31, 2009. More information can be found in this post at LCS & GBML Central.

Related posts:

  1. [BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)
  2. Special issue on chance discovery (I)
  3. GECCO 2009 paper submission deadline extended till January 28

Memetic Computing Journal special issue on Metaheuristics for Large Scale Data Mining – Extended Deadline

Aim and Scope

Data mining and knowledge discovery are crucial techniques across many scientific disciplines. Recent developments such as the Genome Project (and its successors) or the construction of the Large Hadron Collider have provided the scientific community with vast amounts of data. Metaheuristics and other evolutionary algorithms have been successfully applied to a large variety of data mining tasks. Competitive metaheuristic approaches are able to deal with rule, tree and prototype induction, neural networks synthesis, fuzzy logic learning, and kernel machines–to mention but a few. Moreover, the inherent parallel nature of some metaheuristics (e.g. evolutionary approaches, particle swarms, ant colonies, etc) makes them perfect candidates for approaching very large-scale data mining problems.

Although a number of recent techniques have applied these methods to complex data mining domains, we are still far from having a deep and principled understanding of how to scale them to datasets of terascale, petascale or even larger scale. In order to achieve and maintain a relevant role in large scale data mining, metaheuristics need, among other features, to have the capacity of processing vast amounts of data in a reasonable time frame, to use efficiently the unprecedented computer power available nowadays due to advances in high performance computing and to produce when possible- human understandable outputs.

Several research topics impinge on the applicability of metaheuristics for data mining techniques: (1) proper scalable learning paradigms and knowledge representations, (2) better understanding of the relationship between the learning paradigms/representations and the nature of the problems to be solved, (3) efficiency enhancement techniques, and (4) visualization tools that expose as much insight as possible to the domain experts based on the learned knowledge.

We would like to invite researchers to submit contributions on the area of large-scale data mining using metaheuristics. Potentially viable research themes are:

  • Learning paradigms based on metaheuristics, evolutionary algorithms, learning classifier systems, particle swarm, ant colonies, tabu search, simulated annealing, etc
  • Hybridization with other kinds of machine learning techniques including exact and approximation algorithms
  • Knowledge representations for large-scale data mining
  • Advanced techniques for enhanced prediction (classification, regression/function approximation, clustering, etc.) when dealing with large data sets
  • Efficiency enhancement techniques
  • Parallelization techniques
  • Hardware acceleration techniques (vectorial instuctions, GPUs, etc.)
  • Theoretical models of the scalability limits of the learning paradigms/representations
  • Principled methodologies for experiment design (choosing methods, adjusting parameters, etc.)
  • Explanatory power and visualization of generated solutions
  • Data complexity analysis and measures
  • Ensemble methods
  • Online data mining and data streams
  • Examples of real-world successful applications

Instructions for authors

Papers should have approximately 20 pages (but certainly not more than 24 pages). The papers must follow the format of the Memetic Computing journal:
http://www.springer.com/engineering/journal/12293?detailsPage=contentItemPage&CIPageCounter=151543

Papers should be submitted following the Memetic Computing journal guidelines. When submitting the paper please select this special issue as the article type.

Important dates

  • Manuscript submission: May 31st, 2009
  • Notification of acceptance: July 31st, 2009
  • Submission of camera-ready version: Sep 30th, 2009

Guest editors:

Jaume Bacardit
School of Computer Science and School of Biosciences
University of Nottingham
jaume.bacardit@nottingham.ac.uk

Xavier Llorà
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
xllora@illinois.edu

Meandre overview slides

On May 26th I gave a seminar about Meandre’s basics at the Computer Science department at University of Illinois . The talk was part of the Cloud Computing Seminars. I merged together slides I have been using to talk about Meandre, and tried to give it an easy to grasp overview flavor. You view […]

Related posts:

  1. An Overview of the DISCUS project
  2. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
  3. Meandre 1.4.0 released, 1.4.1 coming short after

On May 26th I gave a seminar about Meandre’s basics at the Computer Science department at University of Illinois . The talk was part of the Cloud Computing Seminars. I merged together slides I have been using to talk about Meandre, and tried to give it an easy to grasp overview flavor. You view them below, or you can also download them here.

Related posts:

  1. An Overview of the DISCUS project
  2. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
  3. Meandre 1.4.0 released, 1.4.1 coming short after

Squeezing for cycles

Sometimes thinking a bit helps to rush decisions that may lead to weird places. Today I was going over a simple genetic algorithm for numeric optimization written in C. The code is nothing special, tournament selection without replacement, SBX crossover operator, and polynomial mutation. To the point, I was running a simple OneMax-like problem (in […]

Related posts:

  1. Evaluation consistency in iGAs: User contradictions as cycles in partial-ordering graphs
  2. A simple and flexible GA loop in Python
  3. Fast mutation implementation for genetic algorithms in Python

Sometimes thinking a bit helps to rush decisions that may lead to weird places. Today I was going over a simple genetic algorithm for numeric optimization written in C. The code is nothing special, tournament selection without replacement, SBX crossover operator, and polynomial mutation. To the point, I was running a simple OneMax-like problem (in this case, minimize the value of the sum of all the genes), and I was quite surprised the guy was taking so long for.

$time ./GATest
----------------------- GA parameters ------------------------------------
Seed: 69
Lower/upper bounds: [1.000000,0.000000]
Genes: 18
Population size: 2000
Iterations: 30
Tournament size: 6
Crossover: pc=0.900000, gwp=0.500000, etaC=10.000000
Mutation: pm=0.100000, etaM=20.000000
----------------------- Evolution Statistics -----------------------------
4.663210	8.974190	13.158102
3.351912	7.405489	11.619360
2.285005	5.375426	9.531691
1.326318	3.711156	7.178203
0.767981	2.432192	4.790854
0.392001	1.543097	3.223604
0.279961	0.977706	2.308249
0.173406	0.600002	1.335702
0.092746	0.359343	0.877080
0.044705	0.216218	0.533978
0.029053	0.130256	0.315306
0.022827	0.078331	0.172908
0.013641	0.047317	0.105886
0.007370	0.028098	0.066994
0.004320	0.016787	0.038499
0.002807	0.010254	0.025155
0.001604	0.006238	0.014528
0.001007	0.003799	0.008883
0.000708	0.002212	0.005627
0.000343	0.001305	0.003263
0.000211	0.000781	0.002025
0.000131	0.000468	0.001155
0.000085	0.000280	0.000774
0.000054	0.000168	0.000392
0.000031	0.000100	0.000243
0.000017	0.000061	0.000144
0.000010	0.000037	0.000083
0.000006	0.000022	0.000054
0.000003	0.000013	0.000035
0.000002	0.000008	0.000020
0.000002	0.000005	0.000011
----------------------- Final outcome ------------------------------------
Min:	(0.000002)	0.000000	0.000000	0.000000	0.000000.000000	0.000000	0.000000	0.000000	0.000000	0.000000.000000	0.000000	0.000000	0.000000	0.000000	0.000000.000000	0.000000
Max:	(0.000011)	0.000000	0.000001	0.000000	0.000000.000001	0.000000	0.000000	0.000001	0.000000	0.000000.000000	0.000003	0.000000	0.000000	0.000000	0.000000.000002	0.000001

real	0m6.748s
user	0m6.228s
sys	0m0.088s

Yup, after turning on all the possible compiler optimizations I could think of, 6.7 seconds was the best I could do on a first generation MacBook Pro. I was wondering if I should spend the time writing a simple multithreaded evaluation. As I said, before making something simple complicated, I decided to put grab Shark (Mac’s free profiler) and get a better picture of what was going. Oh boy! Intuition looking at the wrong place! The outcome: 45% of time spend on tournament selection and 31% generating random number. Mmh, digging a bit further almost all time of tournament selection was spent shuffling an array to guarantee tournaments without replacements.

/** Runs tournament selection without replacement to create a new population
 *
 * popDest: The destination population
 * popInit: The original population
 * fita: The fitness of the initial populaiton
 * iSize: Tournament size
 * iIndividuals: The population size
 * iGenes: The number of genes of an individual
 */
void ga_tournament_selection ( Population popDest, Population popInit, Fitness * fita, int iSize, int iIndividuals , int iGenes )
{
	int piaShuffle[iSize];
 
	int i = -1;
	int j = -1;
 
	int iIndTmp = -1;
	int iIndWin = -1;
 
	Fitness fitWin = DBL_MAX;
	Fitness fitTmp = DBL_MAX;
 
	for ( i=0 ; i<iIndividuals ; i++ ) {
		/* Initialization for the current tournament */
		fitWin  = DBL_MAX;
		iIndWin = -1;
		genrand_shuffle (piaShuffle,iIndividuals);
		for ( j=0 ; j<iSize ; j++ ) {
			// A new randomly drawn individual
			iIndTmp = piaShuffle[j];
			fitTmp  = fita[piaShuffle[j]];
			// If it is the first is the winner
			if ( iIndWin==-1 ) {
				iIndWin  = iIndTmp;
				fitWin = fitTmp;
			}
			// If not, chack the fitness (Minimize)
			else if ( fitWin>fitTmp ) {
				iIndWin  = iIndTmp;
				fitWin = fitTmp;
			}
		}
		population_copy_individual(popDest[i],popInit[iIndWin],iGenes);
	}
}

It was genrand_shuffle (see below) the one that took most of the time. Also if you take a close inspection you will see that it is also the one to blame for calling calling too many times genrand_int31.

void genrand_shuffle ( int * pia, int iSize )
{
	int i, iOther;
	register int iTmp;
	int iRndMax = iSize-1;
 
	// Initialize
	for( i=0; i<iSize; i++ ) pia[i] = i;
	// shuffle
	for( i=0; i<iRndMax; i++ ) {
	    iOther = genrand_int31()%iSize;
	    iTmp = pia[iOther];
	    pia[iOther] = pia[i];
	    pia[i] = iTmp;
	}
}

This inherited implementation of tournament selection works well for small populations, but as you increase the population size, each tournament requires shuffling a number proportional to the population size. If you make the numbers, that leads to a quadratic implementation of tournament selection without replacement. Mmh, really needed? Definitely not. The only thing you need to guarantee to provide a tournament selection without replacement is that you provide different individuals for the tournaments (avoiding repetition). If that selection can be done quickly, you can take the complexity of the implementation down to linear. So there I went, and modified the shuffling function as follows.

void genrand_shuffle_fast ( int * pia, int iSize, int iSlots )
{
	int i = 0;
	int j = 0;
 
	pia[i++] = genrand_int31()%iSize;
	while ( i<iSlots ) {
		pia[i] = genrand_int31()%iSize;
		for ( j=i-1 ; j>=0 && j<i ; j++ )
			if ( pia[j]==pia[i] )
				break;
 
		if ( j==i ) i++ ;
	}
}

Tournaments sizes are usually much much smaller than population sizes (e.g. s=6 for the pop_size=2,000 individuals population used above). Thus, if random numbers are generated, the chances of repeating it are quite small. Also if you also make sure it is not there (and if it is, you generate a new out), basically your are set. (This implementation will only work efficiently if s<<pop_size, otherwise the cost of checking and generated new numbers will be even worst than the original version).

So there I went. I modified the original inherited version of the tournament selection without replacement, and rerun the simple time measures.

$ time ./GATest
----------------------- GA parameters ------------------------------------
Seed: 69
Lower/upper bounds: [1.000000,0.000000]
Genes: 18
Population size: 2000
Iterations: 30
Tournament size: 6
Crossover: pc=0.900000, gwp=0.500000, etaC=10.000000
Mutation: pm=0.100000, etaM=20.000000
----------------------- Evolution Statistics -----------------------------
4.663210	8.974190	13.158102
3.350933	7.401935	11.503243
1.964580	5.461794	9.246779
1.297656	3.819533	7.364562
0.810695	2.512797	5.142622
0.478789	1.603199	3.652348
0.305106	0.999304	2.138109
0.191904	0.602315	1.336870
0.108593	0.361237	0.869652
0.060862	0.219145	0.502403
0.040076	0.136125	0.307478
0.028629	0.084893	0.191327
0.016301	0.052274	0.115169
0.009433	0.032699	0.071849
0.003934	0.020275	0.047970
0.002762	0.012328	0.031204
0.001405	0.007259	0.019575
0.001043	0.004280	0.010909
0.000790	0.002550	0.005799
0.000404	0.001530	0.003566
0.000287	0.000950	0.002406
0.000198	0.000600	0.001390
0.000127	0.000386	0.000818
0.000068	0.000245	0.000599
0.000045	0.000153	0.000377
0.000026	0.000093	0.000206
0.000020	0.000058	0.000125
0.000011	0.000035	0.000095
0.000007	0.000022	0.000049
0.000004	0.000014	0.000029
0.000002	0.000009	0.000018
----------------------- Final outcome ------------------------------------
Min:	(0.000002)	0.000000	0.000000	0.000000	0.000000.000000	0.000000	0.000000	0.000000	0.000000	0.000000.000000	0.000000	0.000000	0.000000	0.000000	0.000000.000000	0.000000
Max:	(0.000018)	0.000001	0.000000	0.000000	0.000000.000000	0.000000	0.000001	0.000001	0.000000	0.000000.000004	0.000000	0.000001	0.000001	0.000002	0.000000.000001	0.000001

real	0m0.258s
user	0m0.246s
sys	0m0.006s

Yup. That simple change yielded a speedup of 26.15. Mmh, as I said, a little bit of thinking helped to avoid going down some crazy path in a rushing code fever for the wrong reasons. Yup, the threading will be very useful if the cost of the evaluation is expensive (as most of the real world optimization are), but for this silly OneMax, no need to make it more complicated than it need ;)

Related posts:

  1. Evaluation consistency in iGAs: User contradictions as cycles in partial-ordering graphs
  2. A simple and flexible GA loop in Python
  3. Fast mutation implementation for genetic algorithms in Python

LCSweb + GBML blog = LCS & GBML Central


LCSweb was designed to allow researchers and those seeking to use Learning Classifier Systems within applications access to material on LCS and discussion between members of the LCS community. The site served this community since its was started by Alwyn Barry in 1997. Enhanced and maintained later by Jan Drugowitsch, LCSweb became a valuable community […]

Related posts:

  1. New blog for LCS and other GBML
  2. LCS and other GBML warming up for GECCO 2006
  3. LCSWeb creates a LCS and GBML paper database

LCSweb was designed to allow researchers and those seeking to use Learning Classifier Systems within applications access to material on LCS and discussion between members of the LCS community. The site served this community since its was started by Alwyn Barry in 1997. Enhanced and maintained later by Jan Drugowitsch, LCSweb became a valuable community resource. The site was completely community-driven and allowed members to contribute to the content of the site and keeping it up to date. Later on in 2005, I started “LCS and other GBML” Blog to cover a gap providing information information regarding the International Workshop on Learning Classifier Systems (IWLCS), the collection of LCS Books available, and GBML related news.

Some of you may have realized that after Jan’s move to Rochester and Alwyn’s retirement from research activities, LCSweb has vanished. Will Browne took on himself to take LCSweb to Reading, but technical circumstances have made that move rocky despite his best efforts. Jan and Will however still have a local copy of LCSweb contents. After talking to Jan and Will, I proposed to merge LCSweb with the LCS and other GBML blog, and host the new site at NCSA where dedicated resources has been made available. Jan and Will agreed with the idea.

We are happy to announce that the merged site (still under the update cycle) can be reached at http://lcs-gbml.ncsa.uiuc.edu. More information about the process can be found here or at there LCS & GBML Central site.

Related posts:

  1. New blog for LCS and other GBML
  2. LCS and other GBML warming up for GECCO 2006
  3. LCSWeb creates a LCS and GBML paper database

IWLCS 2009 call for papers

Please note a few extra days for submission as requested – see important dates.

The Twelfth International Workshop on Learning Classifier Systems (IWLCS 2009) will be held in Montreal, Canada, Thursday, July 9, 2008 during the Genetic and Evolutionary Computation Conference (GECCO-2009), July 8-12, 2009.

Originally, Learning Classifier Systems (LCSs) were introduced by John H. Holland as a way of applying evolutionary computation to machine learning and adaptive behavior problems. Since then, the LCS paradigm has broadened greatly into a framework that encompasses many representations, rule discovery mechanisms, and credit assignment schemes. Current LCS applications range from data mining, to automated innovation and the on-line control of cognitive systems. LCS research includes various actual system approaches: While Wilson’s accuracy-based XCS system (1995) has received the highest attention and gained the highest reputation, studies and developments of other LCSs are usually discussed and contrasted. Advances in machine learning, and reinforcement learning in particular, as well as in evolutionary computation have brought LCS systems the necessary competence and guaranteed learning properties. Novel insights in machine learning and evolutionary computation are being integrated into the LCS framework. Thus, we invite submissions that discuss recent developments in all areas of research on, and applications of, Learning Classifier Systems. IWLCS is the event that brings together most of the core researchers in classifier systems. Moreover, a free introductory tutorial on LCSs is presented the day before the workshop at GECCO 2009. Tutorial and IWLCS workshop thus also provide an opportunity for researchers interested in LCSs to get an impression of the current research directions in the field as well as a guideline for the application of LCSs to their problem domain.

Submissions and Publication

We welcome manuscripts of up to 8 pages in ACM format. Please see the GECCO 2009 information for authors for further format details. However, unlike GECCO, papers do not have to be submitted in anonymous format. All accepted papers will be presented at IWLCS 2009 and will appear in the GECCO workshop volume. Proceedings of the workshop will be published on CD-ROM, and distributed at the conference. Authors will be invited after the workshop to submit revised (full) papers for publication in the next post-workshop proceedings volume (scheduled for 2010), in the Springer LNCS/LNAI book series.

All papers should be submitted in PDF format and e-mailed to: jqb@cs.nott.ac.uk

Important dates (Updated 26.03.09)

Paper submission deadline: Wednesday, April 1, 2009
Notification to authors: Friday, April 8, 2009
Submission of camera-ready material: by Friday, April 17, 2009
Conference registration: by Monday, April 27, 2009
Workshop date: Thursday, July 9, 2009

Committees

Organizing Committee

Jaume Bacardit, University of Nottingham (UK). E-mail: jaume.bacardit@nottingham.ac.uk
Will Browne, University of Reading (UK). E-mail: w.n.browne@reading.ac.uk
Jan Drugowitsch, University of Rochester (USA). E-mail: jdrugowitsch@bcs.rochester.edu

Advisory Committee

Ester Bernadó-Mansilla, Universitat Ramon Llull (Spain)
Martin V. Butz, Universitat Wurzburg (Germany)
Tim Kovacs, University of Bristol (UK)
Xavier Llorà, University of Illinois at Urbana-Champaign (USA)
Pier Luca Lanzi, Politechnico de Milano (Italy)
Wolfgang Stolzmann, Daimler Chrysler AG (Germany)
Keiki Takadama, Tokyo Institute of Technology (Japan)
Stewart Wilson, Prediction Dynamics (USA)

Presentation Zen


Lately I have been thinking about how to effectively communicate ideas over presentations. This has turned to be a key element when trying to convey the benefits of jumping on the Meandre wagon. Presentation Zen is a very interesting resource for methods, techniques, and example on how to convey and communicate ideas. I am working […]

Lately I have been thinking about how to effectively communicate ideas over presentations. This has turned to be a key element when trying to convey the benefits of jumping on the Meandre wagon. Presentation Zen is a very interesting resource for methods, techniques, and example on how to convey and communicate ideas. I am working on revamping some of my Meandre presentations trying to be able to get the points across easily.