Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

by Llorà, X.
IlliGAL technical report 2009001. You can download the pdf here. More information is also available at the Meandre website as part of the SEASR project.
Abstract: Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing […]

by Llorà, X.

IlliGAL technical report 2009001. You can download the pdf here. More information is also available at the Meandre website as part of the SEASR project.

Abstract: Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases—selectorecombinative genetic algorithms and estimation of distribution algorithms—are presented, analyzed, discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.

Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

Abstract: Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases—selectorecombinative genetic algorithms and estimation of distribution algorithms—are presented, analyzed, discussed. This study […]

Abstract: Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases—selectorecombinative genetic algorithms and estimation of distribution algorithms—are presented, analyzed, discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.

Protect yourself from genetic algorithms’ surprises


I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or […]

I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or you may end up getting an unexpected all-terrain “pencil”. Yes, I know the example is quite solution free, but still has some truth to it. How many times have you end up getting something you did not expect, only because evolution find a better fitted crack in your description? Anyway, it is fun what you end running into on the Internet ;)

 

Protect yourself from genetic algorithms’ surprises


I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or […]

I just ran into the comic strip below at xkcd. I am still laughing now. Fitness functions are tricky. Once somebody told me that genetic algorithms always get their target; the main problem is to explain what the target is. If you want to learn a “fountain pen”, you better be accurate defining it or you may end up getting an unexpected all-terrain “pencil”. Yes, I know the example is quite solution free, but still has some truth to it. How many times have you end up getting something you did not expect, only because evolution find a better fitted crack in your description? Anyway, it is fun what you end running into on the Internet ;)

 

Evolutionary Computation in Conceptual Clustering and Tagging

Abstract: The Web 2.0 technologies provide users with collaborative work-spaces over the Internet. For example, Wikipedia is an open source encyclopedia that anyone can edit articles. YouTube provides spaces where users can share videos and annotations about them. Users can put images on Flickers and collaborate each other by categorizing with tagging. These contents are created […]

Abstract: The Web 2.0 technologies provide users with collaborative work-spaces over the Internet. For example, Wikipedia is an open source encyclopedia that anyone can edit articles. YouTube provides spaces where users can share videos and annotations about them. Users can put images on Flickers and collaborate each other by categorizing with tagging. These contents are created by users’ voluntary activities, which is one of the features for the Web 2.0 technology. Some services based on the Web 2.0 have well organized text-based contents on them. For example, Wikipedia has well structured contents, which is due to voluntary activities of the users trying to edit each contents so as to be sophisticated. On the other hands, other services, such as YouTube and Flickers’, only have short sentences or small number of words as annotations. Additionally these annotations are usually different according to each user because participants are not in situation of collaborations. As a result, annotations do not have meaning for videos and pictures. A system that converts annotations into meaningful texts is useful because building texts requires resources while a number of annotations are available.The purpose of this thesis is development of the text builder which is based on the Web 2.0 technology with genetic algorithms and natural language processing. A human interactions system is developed in this thesis for automatically building meaningful tags from annotations. The system consists of mainly two parts: a conceptual clustering component based on natural language processing and a sentence creating component based on genetic algorithms. The conceptual clustering component decomposes annotations into phrases with main ideas. Then, the sentence creating component builds several tags from the phrases. Thirdly created tags are evaluated by users to find better explanations for videos and pictures. Participants are supposed to collaborate through evaluations to create more organized and meaningful tags.The developed system succeed in creating tags from annotations without structures through user-machine interactions. This system is applicable to other systems which has only annotations as participants’ comments.  Because tags created by this system have meanings but short length a system building longer text as sentences is left as future works.

An Analysis of Matching in Learning Classifier Systems

Abstract: We investigate rule matching in learning classifier systems for problems involving binary and real inputs. We consider three rule encodings: the widely used character-based encoding, a specificity-based encoding, and a binary encoding used in Alecsys. We compare the performance of the three algorithms both on matching alone and on typical test problems. The results on […]

Abstract: We investigate rule matching in learning classifier systems for problems involving binary and real inputs. We consider three rule encodings: the widely used character-based encoding, a specificity-based encoding, and a binary encoding used in Alecsys. We compare the performance of the three algorithms both on matching alone and on typical test problems. The results on matching alone show that the population generality influences the performance of the matching algorithms based on string representations in different ways. Character-based encoding becomes slower and slower and generality increases, specificity-based encoding becomes faster and faster as generality increases. The results on typical test problems show that the specificity-based representation can halve the time require for matching but also that binary encoding is about ten times faster on the most difficult problems. Moreover, we extend specificity-based encoding to real-inputs and propose an algorithm that can halve the time require for matching real inputs using an interval-based representation. 

Investigating Restricted Tournament Replacement in ECGA for Non-Stationary Environments

Abstract: This paper investigates the incorporation of restricted tournament replacement (RTR) in the extended compact genetic algorithm (ECGA) for solving problems with non-stationary optima. RTR is a simple yet efficient niching method used to maintain diversity in a population of individuals. While the original version of RTR uses Hamming distance to quantify similarity between individuals, […]

Abstract: This paper investigates the incorporation of restricted tournament replacement (RTR) in the extended compact genetic algorithm (ECGA) for solving problems with non-stationary optima. RTR is a simple yet efficient niching method used to maintain diversity in a population of individuals. While the original version of RTR uses Hamming distance to quantify similarity between individuals, we propose an alternative substructural distance to enforce the niches. The ECGA that restarts the search after a change of environment is compared with the approach of maintaining diversity, using both versions of RTR. Results on several dynamic decomposable test problems demonstrate the usefulness of maintaining diversity throughout the run over the approach of restarting the search from scratch at each change. Furthermore, by maintaining diversity no additional mechanisms are required to detect the change of environment, which is typically a problem-dependent and non-trivial task.