Nonlinear Network Optimization—An Embedding Vector Space Approach

This paper proposes a normed-space vector representation of networks which allows defining evolutionary operators for network optimization that resemble continuous-space operators. These operators are employed here to build a genetic algorithm which be…

This paper proposes a normed-space vector representation of networks which allows defining evolutionary operators for network optimization that resemble continuous-space operators. These operators are employed here to build a genetic algorithm which becomes generic for the optimization of tree networks, without the requirement of any special encoding scheme. Such a genetic algorithm has been compared with several encoding-based genetic algorithms, on 25 and 50-node instances of the optimal communication spanning tree and of the quadratic minimum spanning tree, and has been shown to outperform all other algorithms in a stochastic dominance analysis. The proposed approach has also been applied to an electric power distribution network design (a multibranch problem), outperforming the results presented in a former reference (which have been obtained with an Ant Colony algorithm). The results of some landscape dispersion analysis suggest that the proposed normed-space network vector representation is analogous to some continuous-variable space dilation operations, which define favorable space coordinates for optimization.

On the Importance of Data Balancing for Symbolic Regression

Symbolic regression of input-output data conventionally treats data records equally. We suggest a framework for automatic assignment of weights to data samples, which takes into account the sample’s relative importance. In this paper, we study the poss…

Symbolic regression of input-output data conventionally treats data records equally. We suggest a framework for automatic assignment of weights to data samples, which takes into account the sample’s relative importance. In this paper, we study the possibilities of improving symbolic regression on real-life data by incorporating weights into the fitness function. We introduce four weighting schemes defining the importance of a point relative to proximity, surrounding, remoteness, and nonlinear deviation from k nearest-in-the-input-space neighbors. For enhanced analysis and modeling of large imbalanced data sets we introduce a simple multidimensional iterative technique for subsampling. This technique allows a sensible partitioning (and compression) of data to nested subsets of an arbitrary size in such a way that the subsets are balanced with respect to either of the presented weighting schemes. For cases where a given input-output data set contains some redundancy, we suggest an approach to considerably improve the effectiveness of regression by applying more modeling effort to a smaller subset of the data set that has a similar information content. Such improvement is achieved due to better exploration of the search space of potential solutions at the same number of function evaluations. We compare different approaches to regression on five benchmark problems with a fixed budget allocation. We demonstrate that the significant improvement in the quality of the regression models can be obtained either with the weighted regression, exploratory regression using a compressed subset with a similar information content, or exploratory weighted regression on the compressed subset, which is weighted with one of the proposed weighting schemes.

A Favorable Weight-Based Evolutionary Algorithm for Multiple Criteria Problems

In this paper, we present a favorable weight-based evolutionary algorithm for multiple criteria problems. The algorithm tries to both approximate the Pareto frontier and evenly distribute the solutions over the frontier. These two goals are common for …

In this paper, we present a favorable weight-based evolutionary algorithm for multiple criteria problems. The algorithm tries to both approximate the Pareto frontier and evenly distribute the solutions over the frontier. These two goals are common for many multiobjective evolutionary algorithms. To achieve these goals in our algorithm, each member selects its own weights for a weighted Tchebycheff distance function to define its fitness score. The fitness scores favor solutions that are closer to the Pareto frontier and that are located at underrepresented regions. We compare the performance of the algorithm with two leading evolutionary algorithms on various continuous test problems having different number of criteria.

Implicitly Controlling Bloat in Genetic Programming

During the evolution of solutions using genetic programming (GP) there is generally an increase in average tree size without a corresponding increase in fitness-a phenomenon commonly referred to as bloat. Although previously studied from theoretical an…

During the evolution of solutions using genetic programming (GP) there is generally an increase in average tree size without a corresponding increase in fitness-a phenomenon commonly referred to as bloat. Although previously studied from theoretical and practical viewpoints there has been little progress in deriving controls for bloat which do not explicitly refer to tree size. Here, the use of spatial population structure in combination with local elitist replacement is shown to reduce bloat without a subsequent loss of performance. Theoretical concepts regarding inbreeding and the role of elitism are used to support the described approach. The proposed system behavior is confirmed via extensive computer simulations on benchmark problems. The main practical result is that by placing a population on a torus, with selection defined by a Moore neighborhood and local elitist replacement, bloat can be substantially reduced without compromising performance.

A Novel Set-Based Particle Swarm Optimization Method for Discrete Optimization Problems

Particle swarm optimization (PSO) is predominately used to find solutions for continuous optimization problems. As the operators of PSO are originally designed in an n-dimensional continuous space, the advancement of using PSO to find solutions in a di…

Particle swarm optimization (PSO) is predominately used to find solutions for continuous optimization problems. As the operators of PSO are originally designed in an n-dimensional continuous space, the advancement of using PSO to find solutions in a discrete space is at a slow pace. In this paper, a novel set-based PSO (S-PSO) method for the solutions of some combinatorial optimization problems (COPs) in discrete space is presented. The proposed S-PSO features the following characteristics. First, it is based on using a set-based representation scheme that enables S-PSO to characterize the discrete search space of COPs. Second, the candidate solution and velocity are defined as a crisp set, and a set with possibilities, respectively. All arithmetic operators in the velocity and position updating rules used in the original PSO are replaced by the operators and procedures defined on crisp sets, and sets with possibilities in S-PSO. The S-PSO method can thus follow a similar structure to the original PSO for searching in a discrete space. Based on the proposed S-PSO method, most of the existing PSO variants, such as the global version PSO, the local version PSO with different topologies, and the comprehensive learning PSO (CLPSO), can be extended to their corresponding discrete versions. These discrete PSO versions based on S-PSO are tested on two famous COPs: the traveling salesman problem and the multidimensional knapsack problem. Experimental results show that the discrete version of the CLPSO algorithm based on S-PSO is promising.

Clustering of protein expression data: a benchmark of statistical and neural approaches

Abstract  Clustering issues are fundamental to exploratory analysis of bioinformatics data. This process may follow algorithms that
are reproducible but make assumptions about, for instance, the ability to estimate the global structure by su…

Abstract  

Clustering issues are fundamental to exploratory analysis of bioinformatics data. This process may follow algorithms that
are reproducible but make assumptions about, for instance, the ability to estimate the global structure by successful local
agglomeration or alternatively, they use pattern recognition methods that are sensitive to the initial conditions. This paper
reviews two clustering methodologies and highlights the differences that result from the changes in data representation, applied
to a protein expression data set for breast cancer (n = 1,076). The two clustering methodologies are a reproducible approach to model-free clustering and a probabilistic competitive
neural network. The results from the two methods are compared with existing studies of the same data set, and the preferred
clustering solutions are profiled for clinical interpretation.

  • Content Type Journal Article
  • DOI 10.1007/s00500-010-0596-9
  • Authors
    • I. H. Jarman, Liverpool John Moores University School of Computing and Mathematical Sciences Liverpool UK
    • T. A. Etchells, Liverpool John Moores University School of Computing and Mathematical Sciences Liverpool UK
    • D. Bacciu, University of Pisa Department of Computer Science Pisa Italy
    • J. M. Garibaldi, University of Nottingham School of Computer Science Nottingham UK
    • I. O. Ellis, University of Nottingham Department of Histopathology, School of Molecular Medical Sciences Nottingham UK
    • P. J. G. Lisboa, Liverpool John Moores University School of Computing and Mathematical Sciences Liverpool UK