Region-Focused Memetic Algorithms With Smart Initialization for Real-World Large-Scale Waste Collection Problems

Memetic algorithm (MA) is widely applied to optimize routing problems as it provides one way to combine local search with global search. However, the local search in MA needs to be carefully designed according to the problem’s characteristics. I…

Memetic algorithm (MA) is widely applied to optimize routing problems as it provides one way to combine local search with global search. However, the local search in MA needs to be carefully designed according to the problem’s characteristics. In this article, we consider a real-world large-scale waste collection problem with multiple depots, multiple disposal facilities, multiple trips, and working time constraints. Vehicles with a limited capacity and working time can start from different depots, collect waste at different sites, and make multiple trips to different disposal facilities to empty the waste and return to its origin. While the existing work considered problems with multiple trips and time constraints, none have tackled problems with multiple depots, multiple disposal facilities, multiple trips, as well as working time constraints. The change from “single-depot” to “multidepot” not only reflects better the situation in real life but also leads to a qualitative different and more complex problem. In this article, we first model this complex problem mathematically. Then, a novel region-focused MA is proposed to tackle this new challenge. Compared to classic MA, this region-focused one is enhanced by two major components: 1) a new heuristic-assisted solution initialization algorithm and 2) a region-focused local search with novel heuristics. Comprehensive computational studies show that our proposed approaches significantly outperform several state-of-the-arts on our real problem of thousands of tasks. The new local search procedure and solution initialization method significantly improve the search ability in combination with global search ability of MA.

TechRxiv: Share Your Preprint Research with the World!

Prospective authors are requested to submit new, unpublished manuscripts for inclusion in the upcoming event described in this call for papers.

Prospective authors are requested to submit new, unpublished manuscripts for inclusion in the upcoming event described in this call for papers.

An Online Prediction Approach Based on Incremental Support Vector Machine for Dynamic Multiobjective Optimization

Real-world multiobjective optimization problems usually involve conflicting objectives that change over time, which requires the optimization algorithms to quickly track the Pareto-optimal front (POF) when the environment changes. In recent years, evol…

Real-world multiobjective optimization problems usually involve conflicting objectives that change over time, which requires the optimization algorithms to quickly track the Pareto-optimal front (POF) when the environment changes. In recent years, evolutionary algorithms based on prediction models have been considered promising. However, most existing approaches only make predictions based on the linear correlation between a finite number of optimal solutions in two or three previous environments. These incomplete information extraction strategies may lead to low prediction accuracy in some instances. In this article, an incremental support vector machine (ISVM)-based dynamic multiobjective evolutionary algorithm, in short called ISVM-DMOEA, is proposed. We treat the solving of dynamic multiobjective optimization problems (DMOPs) as an online learning process, using the continuously obtained optimal solution to update an ISVM without discarding the solution information at earlier time. ISVM is then used to filter random solutions and generate an initial population for the next moment. To overcome the obstacle of insufficient training samples, a synthetic minority oversampling strategy is implemented before the training of ISVM. The advantage of this approach is that the nonlinear correlation between solutions can be explored online by ISVM, and the information contained in all historical optimal solutions can be exploited to a greater extent. The experimental results and comparison with the chosen state-of-the-art algorithms demonstrate that the proposed algorithm can effectively tackle DMOPs.

Automating Genetic Algorithm Mutations for Molecules Using a Masked Language Model

Inspired by the evolution of biological systems, genetic algorithms have been applied to generate solutions for optimization problems in a variety of scientific and engineering disciplines. For a given problem, a suitable genome representation must be …

Inspired by the evolution of biological systems, genetic algorithms have been applied to generate solutions for optimization problems in a variety of scientific and engineering disciplines. For a given problem, a suitable genome representation must be defined along with a mutation operator to generate subsequent generations. Unlike natural systems, which display a variety of complex rearrangements (e.g., mobile genetic elements), mutation for genetic algorithms commonly utilizes only random pointwise changes. Furthermore, generalizing beyond pointwise mutations poses a key difficulty as useful genome rearrangements depend on the representation and problem domain. To move beyond the limitations of manually defined pointwise changes, here we propose the use of techniques from masked language models to automatically generate mutations. As a first step, common subsequences within a given population are used to generate a vocabulary. The vocabulary is then used to tokenize each genome. A masked language model is trained on the tokenized data in order to generate possible rearrangements (i.e., mutations). In order to illustrate the proposed strategy, we use string representations of molecules and use a genetic algorithm to optimize for drug-likeness and synthesizability. Our results show that moving beyond random pointwise mutations accelerates genetic algorithm optimization.

A Kernel-Based Indicator for Multi/Many-Objective Optimization

How to evaluate Pareto front approximations generated by multi/many-objective optimizers is a critical issue in the field of multiobjective optimization. Currently, there exist two types of comprehensive quality indicators (i.e., volume-based and dista…

How to evaluate Pareto front approximations generated by multi/many-objective optimizers is a critical issue in the field of multiobjective optimization. Currently, there exist two types of comprehensive quality indicators (i.e., volume-based and distance-based indicators). Distance-based indicators, such as inverted generational distance (IGD), are usually computed by summing up the distance of each reference point to its nearest solution. Their high computational efficiency leads to their prevalence in many-objective optimization. However, in the existing distance-based indicators, the distributions of the solution sets are usually neglected, leading to their lack of ability to well distinguish between different solution sets. This phenomenon may become even more severe in high-dimensional space. To address such an issue, a kernel-based indicator (KBI) is proposed as a comprehensive indicator. Different from other distance-based indicators, a kernel-based maximum mean discrepancy is adopted in KBI for directly measuring the difference that can characterize the convergence, spread, and uniformity of two sets, i.e., the solution set and reference set, by embedding them in reproducing kernel Hilbert space (RKHS). As a result, KBI not only reflects the distance between the solution set and the reference set but also can reflect the distribution of the solution set itself. In addition, to maintain the desirable weak Pareto compliance property of KBI, a nondominated set reconstruction approach is also proposed to shift the original solution set. The detailed theoretical and experimental analysis of KBI is provided in this article. The properties of KBI have also been analyzed by the optimal $mu $ -distribution.

Clustering-Guided Particle Swarm Feature Selection Algorithm for High-Dimensional Imbalanced Data With Missing Values

Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of…

Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of the corresponding FS algorithm. Due to the complex coupling relationship between missing data and class imbalance, the need for better FS method becomes essential. To tackle high-dimensional imbalanced data with missing values, this article studies a new evolutionary FS method. First, an improved $F$ -measure based on filling risk (RF-measure) is defined to evaluate the influence of missing data on the performance of FS in the case of class imbalance. Following that taking the RF-measure as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is proposed. Two new problem-specific operators or strategies, i.e., the swarm initialization strategy guided by fuzzy clustering and the local pruning operator based on feature importance, are developed to improve the performance of PSOFS-FC. Compared with state-of-the-art FS algorithms on several public datasets, experimental results show that PSOFS-FC can achieve excellent classification performance with relatively less running time, indicating its superiority on tackling high-dimensional imbalanced data with missing values.

An Ensemble Surrogate-Based Framework for Expensive Multiobjective Evolutionary Optimization

Surrogate-assisted evolutionary algorithms (SAEAs) have become very popular for tackling computationally expensive multiobjective optimization problems (EMOPs), as the surrogate models in SAEAs can approximate EMOPs well, thereby reducing the time cost…

Surrogate-assisted evolutionary algorithms (SAEAs) have become very popular for tackling computationally expensive multiobjective optimization problems (EMOPs), as the surrogate models in SAEAs can approximate EMOPs well, thereby reducing the time cost of the optimization process. However, with the increased number of decision variables in EMOPs, the prediction accuracy of surrogate models will deteriorate, which inevitably worsens the performance of SAEAs. To deal with this issue, this article suggests an ensemble surrogate-based framework for tackling EMOPs. In this framework, a global surrogate model is trained under the entire search space to explore the global area, while a number of surrogate submodels are trained under different search subspaces to exploit the subarea, so as to enhance the prediction accuracy and reliability. Moreover, a new infill sampling criterion is designed based on a set of reference vectors to select promising samples for training the models. To validate the generality and effectiveness of our framework, three state-of-the-art evolutionary algorithms [nondominated sorting genetic algorithm III (NSGA-III), multiobjective evolutionary algorithm based on decomposition with differential evolution (MOEA/D-DE) and reference vector-guided evolutionary algorithm (RVEA)] are embedded, which significantly improve their performance for solving most of the test EMOPs adopted in this article. When compared to some competitive SAEAs for solving EMOPs with up to 30 decision variables, the experimental results also validate the advantages of our approach in most cases.

Genetic Programming for Manifold Learning: Preserving Local Topology

Manifold learning (MaL) methods are an invaluable tool in today’s world of increasingly huge datasets. MaL algorithms can discover a much lower-dimensional representation (embedding) of a high-dimensional dataset through nonlinear transformation…

Manifold learning (MaL) methods are an invaluable tool in today’s world of increasingly huge datasets. MaL algorithms can discover a much lower-dimensional representation (embedding) of a high-dimensional dataset through nonlinear transformations that preserve the most important structure of the original data. State-of-the-art MaL methods directly optimize an embedding without mapping between the original space and the discovered embedded space. This makes interpretability—a key requirement in exploratory data analysis—nearly impossible. Recently, genetic programming has emerged as a very promising approach to MaL by evolving functional mappings from the original space to an embedding. However, genetic programming-based MaL has struggled to match the performance of other approaches. In this work, we propose a new approach to using genetic programming for MaL, which preserves local topology. This is expected to significantly improve performance on tasks where local neighborhood structure (topology) is paramount. We compare our proposed approach with various baseline MaL methods and find that it often outperforms other methods, including a clear improvement over previous genetic programming approaches. These results are particularly promising, given the potential interpretability and reusability of the evolved mappings.

A Voting-Mechanism-Based Ensemble Framework for Constraint Handling Techniques

Effective constraint handling techniques (CHTs) are of great significance for evolutionary algorithms (EAs) dealing with constrained optimization problems (COPs). To date, many CHTs, such as penalty function, superiority of feasible solutions, and $ep…

Effective constraint handling techniques (CHTs) are of great significance for evolutionary algorithms (EAs) dealing with constrained optimization problems (COPs). To date, many CHTs, such as penalty function, superiority of feasible solutions, and $epsilon $ -constraint (EC), have been designed. However, different CHTs are usually suited to different problems, even the most appropriate technique changes along with the stages of the optimization process. Motivated by this phenomenon, we propose a voting-mechanism-based ensemble framework, named voting mechanism for constraint handling (VMCH), to integrate multiple CHTs for solving various COPs. In this framework, each CHT acts as a voter, all voters vote for each pair of solutions, and the solution in each pair with the highest weighted votes is considered better. In addition, an adaptive strategy is developed to adjust the voter weights according to their historical voting performance. To investigate the performance of VMCH in improving existing algorithms, the proposed VMCH is embedded into the three best algorithms in the competition on constrained single objective real-parameter optimization at CEC 2018, namely, MAgES, iLSHADE $epsilon $ , and IUDE, to form three new algorithm versions, i.e., MAgES-VMCH, iLSHADE $epsilon $ -VMCH, and IUDE-VMCH. They are compared with seven state-of-the-art peer algorithms. Extensive experiments are conducted on 57 real-world COPs. The ranking results show that the new algorithm version MAgES-VMCH takes first place among the ten comparison algorithms. Moreover, all the new VMCH-enhanced versions of the three best algorithms are superior to their original versions. Therefore, the proposed VMCH framework can achieve competitive performance in solving COPs.