# Encyclopedia of Complexity and Systems Science

Living Edition
| Editors: Robert A. Meyers

# Advanced Evolutionary Algorithms in Data Mining

• Janez Brest
Living reference work entry
DOI: https://doi.org/10.1007/978-3-642-27737-5_650-1

## Keywords

Evolutionary Algorithm Differential Evolution Memetic Algorithm Differential Evolution Algorithm Artificial Immune System
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## Definition of the Subject

Evolutionary algorithms (EAs) are stochastic population-based methods inspired by nature. A population consists of several individuals usually encoded as vectors. During an evolutionary process, a population is transformed into a new population. After some such transformations, the algorithm stops and returns a best found individual as solution.

Differential Evolution (DE) is an evolutionary algorithm for global optimization over continuous spaces as well as for optimization over discrete spaces. Nowadays, it is used as a powerful global optimization method within a wide range of research areas.

As a plethora of data are generated in every possible means and data dimensionality increases on a large scale, it is imperative to increase power of methods in data mining, knowledge discovery, as well as in optimization methods that are dealing with high-dimensional massive data, uncertainty environments, and dynamic systems.

## Introduction

(Das and Suganthan 2011): To tackle complex computational problems, researchers have been looking into nature for years – both as model and as metaphor – for inspiration. Optimization is at the heart of many natural processes like Darwinian evolution itself. Through millions of years, every species had to adapt their physical structures to fit to the environments they were in. A keen observation of the underlying relation between optimization and biological evolution led to the development of an important paradigm of computational intelligence – the evolutionary computing techniques for performing very complex search and optimization.

Researchers and scientists from various disciplines such as bioinformatics, engineering, chemistry, among many others, often have to deal with the classical problem of optimization and/or search.

Evolutionary algorithms are population-based algorithms and they are suitable for solving continuous optimization problems as well as for discrete optimization. Evolutionary algorithms include Genetic Algorithms (GA), Evolution Strategies (ES), Evolutionary Programming (EP), Genetic Programming (GP), and DE. Evolutionary algorithms are under a wider group of nature-inspired algorithms, which include also swarm intelligence algorithms (like particle swarm optimization, ant colony optimization, bat algorithms, etc.) and other algorithms (like artificial immune systems, memetic algorithms, harmony search, etc.).

These algorithms usually have several control parameters that are responsible for tuning/controlling the algorithm itself during the optimization process. Good values of the control parameters might improve an algorithm’s performance.

### Objective and Scope of the Article

In this chapter, an overview of DE is presented with the emphasis on a variety of different evolutionary methods available today. The evolutionary algorithms are suitable mechanisms in various domains where an optimization problem can be formulated as a global optimization problem, in which derivatives of the objective function are unavailable, the objective function is multimodal, or it has many local minima. This chapter presents mechanisms that were recently proposed for control parameters and mutation strategies in the DE algorithm. Basic features, advantages, and drawbacks of DE as well as of other evolutionary methods are presented. For the readers not very familiar with the field of evolutionary computation, this article should be a good introduction into the topic, whereas for more experienced readers, it should deepen their knowledge.

## Background

In global optimization we want to solve a particular problem and techniques are more or less sophisticated and also adapted to a type of optimization. The types of optimization can be roughly classified in more classes: single objective optimization, constrained optimization, large-scale optimization, dynamic optimization, multimodal optimization, multiobjective optimization, etc. Each type of optimization has its own and particular challenges, and therefore, an EA algorithm should be adopted to satisfy these challenges. When solving the single objective real-parameter optimization problems, challenges which can arise are regarding properties of the problems:
• Dimensionality – problems may have high dimensions, e.g., big data in data mining (Wu et al. 2014) – in literature also known as “large-scale optimization.”

• Multimodality – problems that have only one optimum, i.e., unimodal, are easy to solve in many cases, while multimodal problems are more challenging.

• Constraints – problems have inequality and equality constraints, which are needed to be satisfied when solving constrained optimization problems. Constraints divide solutions from a whole search space into two groups of feasible and infeasible solutions.

• Dynamic environment – if problem changes its characteristics during the time (e.g., optimum point is moving around in the search space during optimization, etc.), an EA needs to follow these dynamic changes.

• Others.

Multiobjective EAs applied in the domain of data mining have mainly dealt with feature selection, classification, clustering, and association rule mining. Multiobjective EAs for data mining are presented in Mukhopadhyay et al. 2014a, b.

## Basics of Differential Evolution

DE is a floating-point encoding evolutionary algorithm for global optimization over continuous spaces (Storn and Price 1997; Price et al. 2005; Feoktistov 2006). It is a population based algorithm with a small number of control parameters.

Figure 1 depicts the main steps of DE. During an evolutionary process, the DE algorithm uses mutation, crossover, and selection operators to generate a next population from the current population. After some generations, the algorithm stops and returns a best found solution.

The DE and GA algorithms belong to the evolutionary algorithms but there are some differences between the DE and GA. GA uses a different order of operators, i.e., the order in the GA algorithm is crossover, mutation, and selection. The another important difference appears in a mutation phase. While GA usually uses mutation in order to introduce very small change(s) of an individual, DE applies much bigger change(s) of an individual during the mutation.

DE has three control parameters: mutation scale factor, F, crossover rate, Cr, and population size, Np. The original DE algorithm (Storn and Price 1995, 1997) keeps all three parameters fixed during the optimization process.

The population consists of Np vectors:
$${\overrightarrow{x}}_i=\left({x}_{i, 1},{x}_{i,2},\cdots, {x}_{i,D}\right),i=1,2,\dots, Np,$$
where D denotes the dimensionality of vectors. In the EA community the vectors are called individuals, while in swarm intelligence vectors present particles, ants, etc.

### Initialization

Domains of the vectors are defined by their lower and upper bounds x j ,low , x j ,upp , where j = 1, 2, … D. The initial population (at G = 0) should cover this range as much as possible and the vectors are initialized by randomly using uniform distribution:
$${x}_{i,j,0}={x}_{j, low}+\mathcal{U}\left(0,1\right)\left({x}_{j,upp}-{x}_{j, low}\right).$$
$$\mathcal{U}$$(0,1) denotes uniform distribution at interval [0,1].

### Mutation

In generation G, a mutant vector $${\overrightarrow{\upsilon}}_{i,G}$$ is calculated as follows:
$${\overrightarrow{\upsilon}}_{i,G}={\overrightarrow{x}}_{r_1,G}+F\left({\overrightarrow{x}}_{r_2,G}-{\overrightarrow{x}}_{r_3,G}\right),$$
(1)
where r 1, r 2, r 3 are randomly chosen indexes generated within the set {1,…, Np} and holds $${r}_1\ne {r}_2\ne {r}_3\ne i.$$ Scale factor F is a real parameter and, as proposed in the original DE, within the range [0, 2]. The scale factor is usually less than 1 in practical applications. Equation 1 is known as “DE/rand/1” mutation strategy. The other DE strategies exist (Das and Suganthan 2011; Neri and Tirronen 2010) and they can be described as follows:
$$\hbox{'}\mathrm{D}\mathrm{E}/\mathrm{rand}/2\hbox{'}: {\overrightarrow{\upsilon}}_{i,G}={\overrightarrow{x}}_{r_1,G}+F\left({\overrightarrow{x}}_{r_2,G}-{\overrightarrow{x}}_{r_3,G}\right)+F\left({\overrightarrow{x}}_{r_4,G}-{\overrightarrow{x}}_{r_5,G}\right),$$
(2)
$$\hbox{'}\mathrm{D}\mathrm{E}/\mathrm{best}/1\hbox{'}:{\overrightarrow{\upsilon}}_{i,G}={\overrightarrow{x}}_{best,G}+F\left({\overrightarrow{x}}_{r_1,G}-{\overrightarrow{x}}_{r_2,G}\right),$$
(3)
$$\hbox{'}\mathrm{D}\mathrm{E}/\mathrm{best}/2\hbox{'}: {\overrightarrow{\upsilon}}_{i,G}={\overrightarrow{x}}_{best,G}+F\left({\overrightarrow{x}}_{r_1,G}-{\overrightarrow{x}}_{r_2,G}\right)+F\left({\overrightarrow{x}}_{r_3,G}-{\overrightarrow{x}}_{r_4,G}\right),$$
(4)
$$\hbox{'}\mathrm{D}\mathrm{E}/\mathrm{current}-\mathrm{t}\mathrm{o}-\mathrm{best}/1\hbox{'}: {\overrightarrow{\upsilon}}_{i,G}={\overrightarrow{x}}_{i,G}+F\left({\overrightarrow{x}}_{best,G}-{\overrightarrow{x}}_{i,G}\right)+F\left({\overrightarrow{x}}_{r_1,G}-{\overrightarrow{x}}_{r_2,G}\right),$$
(5)
$$\hbox{'}\mathrm{D}\mathrm{E}/\mathrm{random}-\mathrm{t}\mathrm{o}-\mathrm{best}/1\hbox{'}: {\overrightarrow{\upsilon}}_{i,G}={\overrightarrow{x}}_{r_1,G}+F\left({\overrightarrow{x}}_{best,G}-{\overrightarrow{x}}_{r_1,G}\right)+F\left({\overrightarrow{x}}_{r_2,G}-{\overrightarrow{x}}_{r_3,G}\right),$$
(6)
where $${\overrightarrow{x}}_{best,G}$$denotes the best vector in generation G, and indexes r1–r5 denote the random and mutually different integers generated within the set {1,…, Np} and also different from index i.

Before next operation starts, a check if some components of the mutant vector are out of their lower and upper bounds is applied. The components are randomly generated once again until they are out of bounds or they are set on bounds, which are widely used approaches for repairing a mutant vector.

### Crossover

After mutation, a “binary” crossover operation forms the trial vector $${\overrightarrow{u}}_{i,G}$$, according to the i-th population vector and its corresponding mutant vector as follows:
$${u}_{i,j,G}=\left\{\begin{array}{l}{v}_{i,j,G}, \operatorname{if} j={j}_{rnd} \operatorname {or}\;rnd\left[0,1\right)\le Cr,\\ {}{x}_{i,j,G}, \operatorname{otherwise},\end{array}\right.$$
for i = 1, 2,…, Np and j = 1, 2,…, D. rnd[0,1) is a uniformly distributed random number lying between 0 and 1 and it is applied for each i and j. The crossover parameter Cr presents the probability of creating components for trial vector $${\overrightarrow{u}}_{i,G}$$ from a mutant vector $${\overrightarrow{\upsilon}}_{i,G}$$. Value of the crossover parameter Cr is within the range [0,1). Randomly chosen integer $${j}_{rnd}\in \left\{1,\dots, Np\right\}$$ is responsible for a trial vector containing at least one component from the mutant vector.

Beside the “binomial” crossover, DE has an exponential crossover variant (Price et al. 2005). The first one is used more times in practice.

### Selection

The selection operator compares the objective fitness values of the population vector $${\overrightarrow{x}}_{i,G}$$and its corresponding trial vector $${\overrightarrow{u}}_{i,G}$$ and makes a decision which vector will survive and become a member of the next generation. In case of a minimization problem, the selection operator is defined as follows:
$${\overrightarrow{x}}_{i,\left(G+1\right)}=\left\{\begin{array}{l}{\overrightarrow{u}}_{i,G}, \operatorname{if} f\left({\overrightarrow{u}}_{i,G}\right)\le f\left({\overrightarrow{x}}_{i,G}\right),\\ {}{\overrightarrow{x}}_{i,G}, \operatorname{otherwise}.\end{array}\right.$$

After selection, the DE algorithm continues once again with a mutation operation followed by crossover until some stopping conditions are met. The stopping condition is predefined with a maximal number of generations, or a maximum number of function evaluations, etc.

## Control Parameters

Globally, we distinguish between two major forms of setting parameter values: parameter tuning and parameter control. Parameter tuning means that an EA user finds good values of the parameters before running an algorithm and then running the algorithm using these (fixed) values. Parameter control means that parameter values are changing during the optimization process, by taking the actual search progress into account.

The original DE algorithm keeps all three control parameters fixed during the optimization process. However, there still exists a lack of knowledge about how to find reasonably good values for the control parameters of DE, for a given problem (Liu and Lampinen 2005). Thus, users are still faced with the problem of preliminary testing and hand tuning of its control parameters prior to commencing the actual optimization process (Teng et al. 2009). As a solution, approaches with adaptation and self-adaptation have proved to be highly beneficial for automatically and dynamically adjusting control parameters during the optimization process. Self-adaptation allows an evolutionary algorithm to adapt itself to any general class of problem, by reconfiguring itself accordingly, and does this without any user interaction (Bäck et al. 1997). On the other hand, when solving a particular problem using tuning, it is possible to find very good parameter values which are usually more competitive than a self-adaptive mechanism (Brest et al. 2015).

Types of parameter control in EA (Eiben et al. 1999; Eiben and Smith 2003; Karafotias et al. 2015) can be roughly classified in three classes:
• Deterministic – a deterministic rule is applied to change the value of a parameter.

• Adaptive – some form of feedback from the search is used for determining the change of parameter’s value.

• Self-adaptive – the idea of “evolution of the evolution” can be used to implement the self-adaptations of parameters. Parameters are encoded into the individuals and they undergo the actions of some operators. The better values of these encoded parameters lead to better individuals which, in turn, are more likely to survive and produce offspring and, hence, propagate these better parameter values (Eiben et al. 1999).

The original DE algorithm has been modified not only in the field of control parameters (i.e., adaptive and/self-adaptive) but also in the following directions:
• New mutation operator(s) (Feoktistov 2006; Das et al. 2009)

• Added local search heuristics (memetic algorithms) (Neri and Tirronen 2009)

• Combined strategies or strategy adaptation (Elsayed et al. 2014)

• Ensembles (Mallipeddi and Suganthan 2010)

• Population size reduction (Brest and Maučec 2008) and adaptation (Teng et al. 2009)

• Others

## A Brief Overview of DE Algorithms

Since DE was introduced in 1995, several improvements were proposed. DE-based algorithms have showed a great performance and win or are ranked top among all algorithms on several evolutionary computation competitions. For excellent DE overview, its related work and applicability within various domains see surveys Das and Suganthan 2011; Neri and Tirronen 2010. Here, we outline some state-of-the-art DE-based algorithms for single optimization.

Qin and Suganthan (Qin et al. 2009) proposed a Self-adaptive Differential Evolution algorithm (SaDE), where the choice of a mutation strategy and control parameters F and Cr are gradually self-adapted according to the learning experience. The algorithm combines two mutation strategies DE/rand/1 and DE/current-to-best/1. The parameter F is approximated by a normal distribution N (0.5, 0.3). A set of F values are randomly sampled from such normal distribution and applied to each target vector within the current population. The SaDE algorithm gradually adjusts the range of Cr values for a given problem according to previous Cr values which have generated trial vectors for successfully entering the next generation. Cr is approximated by a normal distribution with mean value Cr m and standard deviation Std = 0.1, denoted by $$\mathcal{N}$$(Cr m , Std), where Cr m is initialized as 0.5.

The self-adaptive jDE (Brest et al. 2006) “DE/rand/1/bin” strategy is applied on the control parameters F and Cr as follows:
$${F}_{i,\left(G+1\right)}=\left\{\begin{array}{l}{F}_l+rn{d}_1\ast {F}_u, \operatorname{if} rn{d}_2<{\tau}_1,\\ {}{F}_{i,G}, \operatorname{otherwise},\end{array}\right.$$
$$C{r}_{i,\left(G+1\right)}=\left\{\begin{array}{l}rn{d}_3, \operatorname{if} rn{d}_4<{\tau}_2,\\ {}C{r}_{i,G}, \operatorname{otherwise},\end{array}\right.$$
where rnd j , for j ∈ {1, 2, 3, 4} are random values with $$\mathcal{U}$$(0,1) distribution, and parameters τ 1, τ 2, F l , F u have values 0.1, 0.1, 0.1, 0.9, respectively.

Zhang and Sanderson (2009) proposed self-adaptive DE (JADE) with anew mutation strategy, which utilizes information not only from the best individual but also from other good solutions.

Wang et al. in Wang et al. 2013 proposed a version of DE with generalized opposition-based learning and self-adapting control parameters.

A distributed DE with several subpopulations and two migration selection approaches to maintaining a high diversity in the subpopulations are presented in Cheng et al. (2013).

An improved version of JADE is Success-History based Adaptive DE (SHADE) (Tanabe and Fukunaga 2013). SHADE uses a historical memory in order to adapt the control parameters F and CR, current-to-pbest/l mutation strategy (i.e., the same as used JADE), and external archive. L-SHADE (Tanabe and Fukunaga 2014) is an improved version of the SHADE using linear population size reduction.

## Applications

A short outline of applications where the jDE algorithm and other DE-based algorithms are applied: image-based modeling and computer vision – reconstruction of procedural tree models (Zamuda et al. 2011; Zamuda and Brest 2014), tunning chess evaluation function (Bošković et al. 2011), dynamic optimization (Brest et al. 2013), underwater glider path planning (Zamuda and Sosa 2014), and hydrothermal optimization (Glotić and Zamuda 2015).

## Future Directions

In the Differential Evolution algorithm that belongs to evolutionary algorithm, the adaptive and self-adaptive mechanisms of control parameters could improve performance of the algorithm. More mutation strategies and different crossover schemes can also be applied in the algorithm in order that the algorithm can adapt to a particular optimization problem.

Evolutionary algorithms find their applications in many domains of data mining such as bioinformatics, image processing, computer vision, pattern recognition, text categorization, information retrieval, and extraction, among many others. When solving real world problems, challenges are to deal with large-scale optimization and how to allow an expert to interject her/his domain specific expert knowledge.

### Codes and Other Sources

DE home page: www1.icsi.berkeley.edu/~storn/code.html (source codes of DE in many programming languages).

Benchmarks for Evaluation of Evolutionary Algorithms: www.ntu.edu.sg/home/epnsugan/indexfiles/cec-benchmarking.htm

## Bibliography

1. Bäck T, Fogel DB, Michalewicz Z (eds) (1997) Handbook of evolutionary computation, IOP Publishing Ltd., Bristol, UKGoogle Scholar
2. Bošković B, Brest J, Zamuda A, Greiner S, Žumer V (2011) History mechanism supported differential evolution for chess evaluation function tuning. Soft Comput Fusion Found Methodol Appl 15:667–682Google Scholar
3. Brest J, Maučec MS (2008) Population size reduction for the differential evolution algorithm. Appl Intell 29(3):228–247
4. Brest J, Greiner S, Bošković B, Mernik M, Žumer V (2006) Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems. IEEE Trans Evol Comput 10(6):646–657
5. Brest J, Korošec P, Šilc J, Zamuda A, Bošković B, Maučec MS (2013) Differential evolution and differential ant-stigmergy on dynamic optimisation problems. Int J Syst Sci 44:663–679
6. Brest J, Zamuda A, Bošković B (2015) Adaptation in the differential evolution. In: Fister I, Fister I Jr (eds) Adaptation and hybridization in computational intelligence. Adaptation, learning, and optimization, vol 18. Springer International Publishing, Cham, CH. pp 53–68Google Scholar
7. Cheng J, Zhang G, Neri F (2013) Enhancing distributed differential evolution with multicultural migration for global numerical optimization. Inform Sci 247:72–93
8. Das S, Suganthan P (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol Comput 15(1):27–54Google Scholar
9. Das S, Abraham A, Chakraborty U, Konar A (2009) Differential evolution using a neighborhood-based mutation operator. IEEE Trans Evol Comput 13(3):526–553
10. Eiben AE, Smith JE (2003) Introduction to evolutionary computing, Natural computing. Springer, Berlin
11. Eiben AE, Hinterding R, Michalewicz Z (1999) Parameter control in evolutionary algorithms. IEEE Trans Evol Comput 3(2):124–141
12. Elsayed S, Sarker R, Essam D (2014) A self-adaptive combined strategies algorithm for constrained optimization using differential evolution. Appl Math Comput 241:267–282
13. Feoktistov V (2006) Differential evolution: in search of solutions (Springer optimization and its applications). Springer, New York/SecaucusGoogle Scholar
14. Glotić A, Zamuda A (2015) Short-term combined economic and emission hydrothermal optimization by surrogate differential evolution. Appl Energy 141:42–56
15. Karafotias G, Hoogendoorn M, Eiben A (2015) Parameter control in evolutionary algorithms: trends and challenges. IEEE Trans Evol Comput 19(2):167–187
16. Liu J, Lampinen J (2005) A fuzzy adaptive differential evolution algorithm. Soft Comput 9(6):448–462
17. Mallipeddi R, Suganthan P (2010) Ensemble of constraint handling techniques. IEEE Trans Evol Comput 14(4):561–579
18. Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello C (2014a) Survey of multiobjective evolutionary algorithms for data mining: part II. IEEE Trans Evol Comput 18(1):20–35
19. Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello Coello C (2014b) A survey of multiobjective evolutionary algorithms for data mining: part I. IEEE Trans Evol Comput 18(1):4–19
20. Neri F, Tirronen V (2009) Scale factor local search in differential evolution. Memetic Comp 1(2):153–171
21. Neri F, Tirronen V (2010) Recent advances in differential evolution: a survey and experimental analysis. Artif Intell Rev 33(1–2):61–106
22. Price KV, Storn RM, Lampinen JA (2005) Differential evolution, a practical approach to global optimization. Springer, Berlin Heidelberg, Germany.Google Scholar
23. Qin AK, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans Evol Comput 13(2):398–417
24. Storn R, Price K (1995) Differential evolution – a simple and efficient adaptive scheme for global optimization over continuous spaces. Technical report TR-95-012, BerkeleyGoogle Scholar
25. Storn R, Price K (1997) Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341–359
26. Tanabe R, Fukunaga A (2013) Evaluating the performance of shade on cec 2013 benchmark problems. In: 2013 IEEE congress on evolutionary computation (CEC), 20–23 june, 2013, Cancun, Mexico. IEEE, pp 1952–1959Google Scholar
27. Tanabe R, Fukunaga A (2014) Improving the search performance of SHADE using linear population size reduction. In: 2014 IEEE congress on evolutionary computation (CEC2014), 6–11 July 2014, Beijing, China. IEEE, pp 1658–1665Google Scholar
28. Teng NS, Teo J, Hijazi MHA (2009) Self-adaptive population sizing for a tune-free differential evolution. Soft Comput Fusion Found Methodol Appl 13(7):709–724Google Scholar
29. Wang H, Rahnamayan S, Wu Z (2013) Parallel differential evolution with self-adapting control parameters and generalized opposition-based learning for solving high-dimensional optimization problems. J Parallel Distrib Comput 73(1):62–73
30. Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
31. Zamuda A, Brest J (2014) Vectorized procedural models for animated trees reconstruction using differential evolution. Inform Sci 278:1–21
32. Zamuda A, Sosa JDH (2014) Differential evolution and underwater glider path planning applied to the short-term opportunistic sampling of dynamic mesoscale ocean structures. Appl Soft Comput 24:95–108
33. Zamuda A, Brest J, Bošković B, Žumer V (2011) Differential evolution for parameterized procedural woody plant models reconstruction. Appl Soft Comput 11:4904–4912
34. Zhang J, Sanderson A (2009) JADE: adaptive differential evolution with optional external archive. IEEE Trans Evol Comput 13(5):945–958