LCS & GBML Central under inspection

It is very common that NCSA machine get continual attacks over the net. Today I got a note from the security team that LCS & GBML Central may have been compromised. We are looking into it. As a precaution the automatic measures has been fired, that means, that the box is currently unroutable. Hope it […]

Related posts:

  1. LCSweb + GBML blog = LCS & GBML Central
  2. LCS & GBML Central back to production
  3. Large Scale Data Mining using Genetics-Based Machine Learning

It is very common that NCSA machine get continual attacks over the net. Today I got a note from the security team that LCS & GBML Central may have been compromised. We are looking into it. As a precaution the automatic measures has been fired, that means, that the box is currently unroutable. Hope it can be fixed soon. I’ll keep you posted about the progress.

Related posts:

  1. LCSweb + GBML blog = LCS & GBML Central
  2. LCS & GBML Central back to production
  3. Large Scale Data Mining using Genetics-Based Machine Learning

Squeezing for cycles

Sometimes thinking a bit helps to rush decisions that may lead to weird places. Today I was going over a simple genetic algorithm for numeric optimization written in C. The code is nothing special, tournament selection without replacement, SBX crossover operator, and polynomial mutation. To the point, I was running a simple OneMax-like problem (in […]

Related posts:

  1. Evaluation consistency in iGAs: User contradictions as cycles in partial-ordering graphs
  2. A simple and flexible GA loop in Python
  3. Fast mutation implementation for genetic algorithms in Python

Sometimes thinking a bit helps to rush decisions that may lead to weird places. Today I was going over a simple genetic algorithm for numeric optimization written in C. The code is nothing special, tournament selection without replacement, SBX crossover operator, and polynomial mutation. To the point, I was running a simple OneMax-like problem (in this case, minimize the value of the sum of all the genes), and I was quite surprised the guy was taking so long for.

$time ./GATest
----------------------- GA parameters ------------------------------------
Seed: 69
Lower/upper bounds: [1.000000,0.000000]
Genes: 18
Population size: 2000
Iterations: 30
Tournament size: 6
Crossover: pc=0.900000, gwp=0.500000, etaC=10.000000
Mutation: pm=0.100000, etaM=20.000000
----------------------- Evolution Statistics -----------------------------
4.663210	8.974190	13.158102
3.351912	7.405489	11.619360
2.285005	5.375426	9.531691
1.326318	3.711156	7.178203
0.767981	2.432192	4.790854
0.392001	1.543097	3.223604
0.279961	0.977706	2.308249
0.173406	0.600002	1.335702
0.092746	0.359343	0.877080
0.044705	0.216218	0.533978
0.029053	0.130256	0.315306
0.022827	0.078331	0.172908
0.013641	0.047317	0.105886
0.007370	0.028098	0.066994
0.004320	0.016787	0.038499
0.002807	0.010254	0.025155
0.001604	0.006238	0.014528
0.001007	0.003799	0.008883
0.000708	0.002212	0.005627
0.000343	0.001305	0.003263
0.000211	0.000781	0.002025
0.000131	0.000468	0.001155
0.000085	0.000280	0.000774
0.000054	0.000168	0.000392
0.000031	0.000100	0.000243
0.000017	0.000061	0.000144
0.000010	0.000037	0.000083
0.000006	0.000022	0.000054
0.000003	0.000013	0.000035
0.000002	0.000008	0.000020
0.000002	0.000005	0.000011
----------------------- Final outcome ------------------------------------
Min:	(0.000002)	0.000000	0.000000	0.000000	0.000000.000000	0.000000	0.000000	0.000000	0.000000	0.000000.000000	0.000000	0.000000	0.000000	0.000000	0.000000.000000	0.000000
Max:	(0.000011)	0.000000	0.000001	0.000000	0.000000.000001	0.000000	0.000000	0.000001	0.000000	0.000000.000000	0.000003	0.000000	0.000000	0.000000	0.000000.000002	0.000001

real	0m6.748s
user	0m6.228s
sys	0m0.088s

Yup, after turning on all the possible compiler optimizations I could think of, 6.7 seconds was the best I could do on a first generation MacBook Pro. I was wondering if I should spend the time writing a simple multithreaded evaluation. As I said, before making something simple complicated, I decided to put grab Shark (Mac’s free profiler) and get a better picture of what was going. Oh boy! Intuition looking at the wrong place! The outcome: 45% of time spend on tournament selection and 31% generating random number. Mmh, digging a bit further almost all time of tournament selection was spent shuffling an array to guarantee tournaments without replacements.

/** Runs tournament selection without replacement to create a new population
 *
 * popDest: The destination population
 * popInit: The original population
 * fita: The fitness of the initial populaiton
 * iSize: Tournament size
 * iIndividuals: The population size
 * iGenes: The number of genes of an individual
 */
void ga_tournament_selection ( Population popDest, Population popInit, Fitness * fita, int iSize, int iIndividuals , int iGenes )
{
	int piaShuffle[iSize];
 
	int i = -1;
	int j = -1;
 
	int iIndTmp = -1;
	int iIndWin = -1;
 
	Fitness fitWin = DBL_MAX;
	Fitness fitTmp = DBL_MAX;
 
	for ( i=0 ; i<iIndividuals ; i++ ) {
		/* Initialization for the current tournament */
		fitWin  = DBL_MAX;
		iIndWin = -1;
		genrand_shuffle (piaShuffle,iIndividuals);
		for ( j=0 ; j<iSize ; j++ ) {
			// A new randomly drawn individual
			iIndTmp = piaShuffle[j];
			fitTmp  = fita[piaShuffle[j]];
			// If it is the first is the winner
			if ( iIndWin==-1 ) {
				iIndWin  = iIndTmp;
				fitWin = fitTmp;
			}
			// If not, chack the fitness (Minimize)
			else if ( fitWin>fitTmp ) {
				iIndWin  = iIndTmp;
				fitWin = fitTmp;
			}
		}
		population_copy_individual(popDest[i],popInit[iIndWin],iGenes);
	}
}

It was genrand_shuffle (see below) the one that took most of the time. Also if you take a close inspection you will see that it is also the one to blame for calling calling too many times genrand_int31.

void genrand_shuffle ( int * pia, int iSize )
{
	int i, iOther;
	register int iTmp;
	int iRndMax = iSize-1;
 
	// Initialize
	for( i=0; i<iSize; i++ ) pia[i] = i;
	// shuffle
	for( i=0; i<iRndMax; i++ ) {
	    iOther = genrand_int31()%iSize;
	    iTmp = pia[iOther];
	    pia[iOther] = pia[i];
	    pia[i] = iTmp;
	}
}

This inherited implementation of tournament selection works well for small populations, but as you increase the population size, each tournament requires shuffling a number proportional to the population size. If you make the numbers, that leads to a quadratic implementation of tournament selection without replacement. Mmh, really needed? Definitely not. The only thing you need to guarantee to provide a tournament selection without replacement is that you provide different individuals for the tournaments (avoiding repetition). If that selection can be done quickly, you can take the complexity of the implementation down to linear. So there I went, and modified the shuffling function as follows.

void genrand_shuffle_fast ( int * pia, int iSize, int iSlots )
{
	int i = 0;
	int j = 0;
 
	pia[i++] = genrand_int31()%iSize;
	while ( i<iSlots ) {
		pia[i] = genrand_int31()%iSize;
		for ( j=i-1 ; j>=0 && j<i ; j++ )
			if ( pia[j]==pia[i] )
				break;
 
		if ( j==i ) i++ ;
	}
}

Tournaments sizes are usually much much smaller than population sizes (e.g. s=6 for the pop_size=2,000 individuals population used above). Thus, if random numbers are generated, the chances of repeating it are quite small. Also if you also make sure it is not there (and if it is, you generate a new out), basically your are set. (This implementation will only work efficiently if s<<pop_size, otherwise the cost of checking and generated new numbers will be even worst than the original version).

So there I went. I modified the original inherited version of the tournament selection without replacement, and rerun the simple time measures.

$ time ./GATest
----------------------- GA parameters ------------------------------------
Seed: 69
Lower/upper bounds: [1.000000,0.000000]
Genes: 18
Population size: 2000
Iterations: 30
Tournament size: 6
Crossover: pc=0.900000, gwp=0.500000, etaC=10.000000
Mutation: pm=0.100000, etaM=20.000000
----------------------- Evolution Statistics -----------------------------
4.663210	8.974190	13.158102
3.350933	7.401935	11.503243
1.964580	5.461794	9.246779
1.297656	3.819533	7.364562
0.810695	2.512797	5.142622
0.478789	1.603199	3.652348
0.305106	0.999304	2.138109
0.191904	0.602315	1.336870
0.108593	0.361237	0.869652
0.060862	0.219145	0.502403
0.040076	0.136125	0.307478
0.028629	0.084893	0.191327
0.016301	0.052274	0.115169
0.009433	0.032699	0.071849
0.003934	0.020275	0.047970
0.002762	0.012328	0.031204
0.001405	0.007259	0.019575
0.001043	0.004280	0.010909
0.000790	0.002550	0.005799
0.000404	0.001530	0.003566
0.000287	0.000950	0.002406
0.000198	0.000600	0.001390
0.000127	0.000386	0.000818
0.000068	0.000245	0.000599
0.000045	0.000153	0.000377
0.000026	0.000093	0.000206
0.000020	0.000058	0.000125
0.000011	0.000035	0.000095
0.000007	0.000022	0.000049
0.000004	0.000014	0.000029
0.000002	0.000009	0.000018
----------------------- Final outcome ------------------------------------
Min:	(0.000002)	0.000000	0.000000	0.000000	0.000000.000000	0.000000	0.000000	0.000000	0.000000	0.000000.000000	0.000000	0.000000	0.000000	0.000000	0.000000.000000	0.000000
Max:	(0.000018)	0.000001	0.000000	0.000000	0.000000.000000	0.000000	0.000001	0.000001	0.000000	0.000000.000004	0.000000	0.000001	0.000001	0.000002	0.000000.000001	0.000001

real	0m0.258s
user	0m0.246s
sys	0m0.006s

Yup. That simple change yielded a speedup of 26.15. Mmh, as I said, a little bit of thinking helped to avoid going down some crazy path in a rushing code fever for the wrong reasons. Yup, the threading will be very useful if the cost of the evaluation is expensive (as most of the real world optimization are), but for this silly OneMax, no need to make it more complicated than it need ;)

Related posts:

  1. Evaluation consistency in iGAs: User contradictions as cycles in partial-ordering graphs
  2. A simple and flexible GA loop in Python
  3. Fast mutation implementation for genetic algorithms in Python

LCSweb + GBML blog = LCS & GBML Central


LCSweb was designed to allow researchers and those seeking to use Learning Classifier Systems within applications access to material on LCS and discussion between members of the LCS community. The site served this community since its was started by Alwyn Barry in 1997. Enhanced and maintained later by Jan Drugowitsch, LCSweb became a valuable community […]

Related posts:

  1. New blog for LCS and other GBML
  2. LCS and other GBML warming up for GECCO 2006
  3. LCSWeb creates a LCS and GBML paper database

LCSweb was designed to allow researchers and those seeking to use Learning Classifier Systems within applications access to material on LCS and discussion between members of the LCS community. The site served this community since its was started by Alwyn Barry in 1997. Enhanced and maintained later by Jan Drugowitsch, LCSweb became a valuable community resource. The site was completely community-driven and allowed members to contribute to the content of the site and keeping it up to date. Later on in 2005, I started “LCS and other GBML” Blog to cover a gap providing information information regarding the International Workshop on Learning Classifier Systems (IWLCS), the collection of LCS Books available, and GBML related news.

Some of you may have realized that after Jan’s move to Rochester and Alwyn’s retirement from research activities, LCSweb has vanished. Will Browne took on himself to take LCSweb to Reading, but technical circumstances have made that move rocky despite his best efforts. Jan and Will however still have a local copy of LCSweb contents. After talking to Jan and Will, I proposed to merge LCSweb with the LCS and other GBML blog, and host the new site at NCSA where dedicated resources has been made available. Jan and Will agreed with the idea.

We are happy to announce that the merged site (still under the update cycle) can be reached at http://lcs-gbml.ncsa.uiuc.edu. More information about the process can be found here or at there LCS & GBML Central site.

Related posts:

  1. New blog for LCS and other GBML
  2. LCS and other GBML warming up for GECCO 2006
  3. LCSWeb creates a LCS and GBML paper database

Revamping my Twitter accounts


Since summer 2007 I have been twittering. It started as way to have a conversation with a bunch of friends scattered all over. Since we were not discussing any world-changing topic, my updates have been kept private. Lately, I have been receiving requests to follow me. So, instead of polluting my original intent, @xllora is […]

Since summer 2007 I have been twittering. It started as way to have a conversation with a bunch of friends scattered all over. Since we were not discussing any world-changing topic, my updates have been kept private. Lately, I have been receiving requests to follow me. So, instead of polluting my original intent, @xllora is now my public Twitter account and I moved my previous tweets to @panellet (still not public). Hope this gets things a bit simplified. I still wish that Twitter would one day allow you to have a better access control, but, oh well, it is what it is right now. Also, I added my @xllora tweets below :D

Protect yourself from genetic algorithms’ surprises


I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or […]

I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or you may end up getting an unexpected all-terrain “pencil”. Yes, I know the example is quite solution free, but still has some truth to it. How many times have you end up getting something you did not expect, only because evolution find a better fitted crack in your description? Anyway, it is fun what you end running into on the Internet ;)

 

Protect yourself from genetic algorithms’ surprises


I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or […]

I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or you may end up getting an unexpected all-terrain “pencil”. Yes, I know the example is quite solution free, but still has some truth to it. How many times have you end up getting something you did not expect, only because evolution find a better fitted crack in your description? Anyway, it is fun what you end running into on the Internet ;)