Several artificial intelligence (AI) methods have evolved in recent past that facilitate to solve optimization problems which were previously difficult or impossible to solve. These techniques include differential evolution, particle swarm optimization, evolutionary programming, genetic algorithm, simulated annealing, etc. Reports of applications of each of these techniques have been widely published. The most important advantage of AI techniques lies in the fact that they are not limited by restrictive assumptions about the search space like continuity, existence of derivative of objective function, etc. These methods share some similarities. DE is introduced first, and followed by PSO, EP, GA, and SA.
Differential evolution
Differential evolution (DE) [14,15,16] is a type of evolutionary algorithm originally proposed by Price and Storn [14] for optimization problems over a continuous domain. DE is exceptionally simple, significantly faster, and robust. The basic idea of DE is to adapt the search during the evolutionary process. At the start of the evolution, the perturbations are large, since parent populations are far away from each other. As the evolutionary process matures, the population converges to a small region and the perturbations adaptively become small. As a result, the evolutionary algorithm performs a global exploratory search during the early stages of the evolutionary process and local exploitation during the mature stage of the search. In DE, the fittest of an offspring competes one-to-one with that of corresponding parent which is different from other evolutionary algorithms. This one-to-one competition gives rise to faster convergence rate. Price and Storn gave the working principle of DE with simple strategy in [14]. Later on, they suggested ten different strategies of DE [16]. Strategy-7 (DE/rad/1/bin) is the most successful and widely used strategy. The key parameters of control in DE are population size (\(N_{P}\)), scaling factor (\(S_{F}\)), and crossover constant (\(C_{R}\)). The optimization process in DE is carried out with three basic operations: mutation, crossover, and selection. The DE algorithm is described as follows:
Initialization
The initial population of \(N_{P}\) vectors is randomly selected based on uniform probability distribution for all variables to cover the entire search uniformly. Each individual \(X_{i}\) is a vector that contains as many parameters as the problem decision variables \(D\). Random values are assigned to each decision parameter in every vector according to:
$$X_{ij}^{0} \sim U\left( {X_{j}^{\text{min} } ,X_{j}^{\text{max} } } \right),$$
(1)
where \(i = 1, \ldots ,N_{P}\) and \(j = 1, \ldots ,D\); \(X_{j}^{\text{min} }\) and \(X_{j}^{\text{max} }\) are the lower and upper bounds of the jth decision variable; \(U\left( {X_{j}^{\text{min} } ,X_{j}^{\text{max} } } \right)\) denotes a uniform random variable ranging over \(\left[ {X_{j}^{\text{min} } ,X_{j}^{\text{max} } } \right]\). \(X_{ij}^{0}\) is the initial jth variable of ith population. All the vectors should satisfy the constraints. Evaluate the value of the cost function \(f\left( {X_{i}^{0} } \right)\) of each vector.
Mutation
DE generates new parameter vectors by adding the weighted difference vector between two population members to a third member. For each target vector \(X_{i}^{g}\) at gth generation, the noisy vector \(X_{i}^{/g}\) is obtained by
$$X_{i}^{/g} = X_{a}^{g} + S_{F} \left( {X_{b}^{g} - X_{c}^{g} } \right),\quad i \in N_{P} ,$$
(2)
where \(X{}_{a}^{g}\), \(X_{b}^{g}\) and \(X_{c}^{g}\) are selected randomly from \(N_{P}\) vectors at \(g\)th generation and \(a \ne b \ne c \ne i\). The scaling factor (\(S_{F}\)), in the range \(0 < S_{F} \le 1.2\), controls the amount of perturbation added to the parent vector. The noisy vectors should satisfy the constraint.
Crossover
Perform crossover for each target vector \(X_{i}^{g}\) with its noisy vector \(X_{i}^{/g}\) and create a trial vector \(X_{i}^{//g}\), such that
$$X_{i}^{//g} = \left\{ {\begin{array}{*{20}ll} {X_{i}^{/g} ,} & \quad{{\text{if}}\; \rho \le C_{R} } \\ {X_{i}^{g} ,} & \quad {\text{otherwise}} \\ \end{array} } \right.,\quad i \in N_{P} ,$$
(3)
where \(\rho\) is an uniformly distributed random number within [0, 1]. The crossover constant (\(C_{R}\)), in the range \(0 \le C_{R} \le 1\), controls the diversity of the population and aids the algorithm to escape from local optima.
Selection
Perform selection for each target vector, \(X_{i}^{g}\) by comparing its cost with that of the trial vector, \(X_{i}^{//g}\). The vector that has lesser cost of the two would survive for the next generation:
$$X_{i}^{g + 1} = \left\{ {\begin{array}{*{20}ll} {X_{i}^{//g} ,} & \quad {{\text{if}}\;f\left( {X_{i}^{//g} } \right) \le f\left( {{\rm X}_{i}^{g} } \right)} \\ {X_{i}^{g} ,} & \quad {\text{otherwise}} \\ \end{array} ,} \right.\quad i \in N_{P}$$
(4)
The process is repeated until the maximum number of generations or no improvement is seen in the best individual after many generations.
Particle swarm optimization
Particle swarm optimization (PSO) [17, 18] has been developed under the scope of artificial life where it is inspired by the natural phenomenon of fish schooling or birds flocking. PSO is basically based on the fact that in the quest of reaching the optimum solution in a multidimensional space, a population of particles is created whose present coordinate determines the cost function to be minimized. After each iteration, the new velocity and the new position of each particle are updated on the basis of a summated influence of each particle’s present velocity, distance of the particle from its own best performance achieved so far during the search process and the distance of the particle from the leading particle, i.e., the particle which at present is globally the best particle producing till now the best performance.
Usually, \(x\) and \(v\) are the variables employed to denote the position and the velocity of a particle in a multidimensional solution space. In a \(d\) dimensional space, the position and velocity of a particle \(i\) are represented as \(d \times 1\) vectors, \(x_{i} = \left( {x_{i1} ,x_{i2} , \ldots ,x_{id} } \right)\) and \(v_{i} = \left( {v_{i1} ,v_{i2} , \ldots ,v_{id} } \right)\) respectively. For each particle \(i\), the best position found so far is stored as another \(d \times 1\) vector \(p{\text{best}}_{i} = \left( {p{\text{best}}_{i1} ,p{\text{best}}_{i2} , \ldots ,p{\text{best}}_{id} } \right)\). The best global particle among all particle \(i\) is denoted as \(g{\text{best}}\) and its coordinate in the \(d\)th dimension is given as \(g{\text{best}}_{d}\). Hence, the velocity and position update equations for the \(i\)th particle in the \(d\)th dimension in the \(\left( {k + 1} \right)\)th iteration, based on the performance in \(k\)th iteration are given as:
$$v_{id}^{{\left( {k + 1} \right)}} = w \times v_{id}^{k} + c_{1} \times rand\left( {} \right) \times \left( {p{\text{best}}_{id} - x_{id}^{k} } \right) + c_{2} \times rand\left( {} \right) \times \left( {g{\text{best}}_{d} - x_{id}^{k} } \right)$$
(5)
$$x_{id}^{{\left( {k + 1} \right)}} = x_{id}^{k} + v_{id}^{{\left( {k + 1} \right)}} ,\quad i \in N_{P} ,\quad d \in D,$$
(6)
where \(D\) stands for the total number of dimensions for the multidimensional search problem and \(N_{P}\) stands for the population size. \(c_{1}\) and \(c_{2}\) give acceleration constants which provide relative stochastic weighting (implemented by the \(rand\left( {} \right)\) function which generates any value \(\in \left[ {0,1} \right]\)) of the deviation from the best own performance of the particle itself and the best performance of the group as a whole, so far, in the \(d\)th dimension. The velocity of the particle in the \(k\)th iteration in the \(d\)th dimension is given as \(v_{d}^{\text{min} } \le v_{id}^{k} \le v_{d}^{\text{max} }\). Here, \(v^{\text{max} }\) is influential to determine the resolution with which regions are to be searched between the present position and the target position. A proper value should be chosen, such that \(v^{\text{max} }\) is neither too high nor too small.
The present system employs the PSO algorithm with adaptable inertia weight \(w\), during the entire process of search, so that we can obtain a suitable balance between global and local explorations. In this work, the inertia weight \(w\) is set according to the following equation:
$$w = w_{\text{max} } - \frac{{w_{\text{max} } - w_{\text{min} } }}{{{\text{iter}}_{\text{max} } }} \times {\text{iter,}}$$
(7)
where \({\text{iter}}_{\text{max} }\) is the maximum number of iterations and \({\text{iter}}\) is the current number of iterations. We start with a high value of \(w_{\text{max} }\), such that we can perform aggressive global search initially in quest of potential good solution and gradually reduce \(w\), such that we can fine tune our search locally as we move closer and closer to the minimum point.
Evolutionary programming
Evolutionary programming (EP) [20] is a technique in the field of evolutionary computation. It seeks the optimal solution by evolving a population of candidate solutions over a number of generations or iterations. During each iteration, a second new population is formed from an existing population through the use of a mutation operator. This operator produces a new solution by perturbing each component of an existing solution by a random amount. The degree of optimality of each of the candidate solutions or individuals is measured by their fitness, which can be defined as a function of the objective function of the problem. Through the use of a competition scheme, the individuals in each population compete with each other. The winning individuals form a resultant population, which is regarded as the next generation. For optimization to occur, the competition scheme must be such that the more optimal solutions have a greater chance of survival than the poorer solutions. Through this, the population evolves towards the global optimal point. The algorithm is described as follows:
- 1.
Initialization: The initial population of control variables is selected randomly from the set of uniformly distributed control variables ranging over their upper and lower limits. The fitness score \(f_{i}\) is obtained according to the objective function and the environment.
- 2.
Statistics: The maximum fitness \(f_{\text{max} }\), minimum fitness \(f_{\text{min} }\), the sum of fitness \(\sum f\), and average fitness \(f_{\text{avg}}\) of this generation are calculated.
- 3.
Mutation: Each selected parent, for example, \(X_{i}\), is mutated and added to its population with the following rule:
$$X_{i + m,j} = X_{ij} + N\left( {0,\gamma \left( {\overline{x}_{j} - \underline{x}_{j} } \right)\frac{{f_{i} }}{{f_{\text{max} } }}} \right),\quad j \in D,\;i \in N_{P} ,$$
(8)
where \(D\) is the number of decision variables in an individual, \(N_{P}\) is the population size, \(X_{ij}\) denotes the \(j\) th element of the \(i\)th individual; \(N\left( {\mu ,\sigma^{2} } \right)\) represents a Gaussian random variable with mean \(\mu\) and variance \(\sigma^{2}\); \(f_{\text{max} }\) is the maximum fitness of the old generation which is obtained in statistics; \(\overline{x}_{j}\) and \(\underline{x}_{j}\) are, respectively, maximum and minimum limits of the \(j\)th element; and \(\gamma\) is the mutation scale, \(0 < \gamma \le 1\), that could be adaptively decreased during generations. If any mutated value exceeds its limit, it will be given the limit value. The mutation process allows an individual with larger fitness to produce more offspring for the next generation.
- 4.
Competition: Several individuals (\(k\)) which have the best fitness are kept as the parents for the next generation. Other individuals in the combined population of size (\(2N_{P} - k\)) have to compete with each other to get their chances for the next generation. A weight value \(w_{i}\) of the \(i\)th individual is calculated by the following competition:
$$w_{i} = \sum\limits_{t = 1}^{{{\rm N}_{t} }} {w_{i,t} } ,$$
(9)
where \(N_{t}\) is the competition number generated randomly; \(w_{i,t}\) is either 0 for loss or 1 for win as the \(i\)th individual competes with a randomly selected (\(r\)th) individual in the combined population. The value of \(w_{i,t}\) is given in the following equation:
$$w_{i,t} = \left\{ {\begin{array}{*{20}ll} 1 & \quad {{\text{if}}\;f_{i} < f_{r} } \\ 0 & \quad {\text{otherwise}}, \\ \end{array} } \right.$$
(10)
where \(f_{r}\) is the fitness of randomly selected \(r\)th individuals, and \(f_{i}\) is the fitness of the \(i\) th individual. When all \(2N_{P}\) individuals, get their competition weights, they will be ranked in a descending order according to their corresponding value \(w_{i}\). The first \(m\) individuals are selected along with their corresponding fitness \(f_{i}\) to be the bases for the next generation. The maximum, minimum, and the average fitness and the sum of the fitness of the current generation are then calculated in the statistics.
- 5.
Convergence test: If the convergence condition is not met, the mutation and competition will run again. The maximum generation number can be used for convergence condition. Other criteria, such as the ratio of the average and the maximum fitness of the population, are computed and generations are repeated until
$$\left\{ {{{f_{\text{avg}} } \mathord{\left/ {\vphantom {{f_{\text{avg}} } {f_{\text{max} } }}} \right. \kern-0pt} {f_{\text{max} } }}} \right\} \ge \delta ,$$
(11)
where \(\delta\) should be very close to 1, which represents the degree of satisfaction. If the convergence has reached a given accuracy, an optimal solution has been found for an optimization problem.
Genetic algorithm
Genetic algorithm [21] is based on the mechanics of natural selection. An initial population of candidate solutions is created randomly. Each of these candidate solutions is termed as individual. Each individual is assigned a fitness, which measures its quality. During each generation of the evolutionary process, individuals with higher fitness are favored and more probabilities to be selected as parents. After parents are selected for reproduction, they produce children via the processes of crossover and mutation. The individuals formed during reproduction explore different areas of the solution space. These new individuals replace lesser fit individuals of the existing population.
Due to difficulties of binary representation when dealing with continuous search space with large dimensions, the proposed approach has been implemented using real-coded genetic algorithm (RCGA) [22, 23]. The simulated Binary Crossover (SBX) and polynomial mutation are explained as follows.
Simulated binary crossover (SBX) operator
The procedure of computing child populations \(c_{1}\) and \(c_{2}\) from two parent populations \(y_{1}\) and \(y_{2}\) under SBX operator as follows:
- 1.
Create a random number u between 0 and 1.
- 2.
Find a parameter \(\gamma\) using a polynomial probability distribution as follows:
$$\gamma = \left\{ {\begin{array}{*{20}ll} {\left( {u\gamma } \right)^{{{1 \mathord{\left/ {\vphantom {1 {\left( {\eta_{c} + 1} \right)}}} \right. \kern-0pt} {\left( {\eta_{c} + 1} \right)}}}} ,} & \quad {{\text{if}}\;u \le \frac{1}{\gamma }} \\ {\left( {{1 \mathord{\left/ {\vphantom {1 {\left( {2 - u\gamma } \right)}}} \right. \kern-0pt} {\left( {2 - u\gamma } \right)}}} \right)^{{{1 \mathord{\left/ {\vphantom {1 {\left( {\eta_{c} + } \right)}}} \right. \kern-0pt} {\left( {\eta_{c} + } \right)}}}} ,} & \quad {\text{otherwise}}, \\ \end{array} } \right.$$
(12)
where \(\gamma = 2 - \delta^{{ - \left( {\eta_{c} + 1} \right)}} .\) and \(\delta = 1 + \frac{2}{{y_{2} - y_{1} }}\text{min} \left[ {\left( {y_{1} - y_{l} } \right),\left( {y_{u} - y_{2} } \right)} \right]\)
Here, the parameter \(y\) is assumed to vary in \(\left[ {y_{l} ,y_{u} } \right]\). Here, the parameter \(\eta_{c}\) is the distribution index for SBX and can take any non-negative value. A small value of \(\eta_{c}\) allows the creation of child populations far away from parents and a large value restricts only near-parent populations to be created as child populations.
- 3.
The intermediate populations are calculated as follows:
$$\begin{aligned} c_{p1} & = 0.5\left[ {\left( {y_{1} + y_{2} } \right) - \gamma \left( {\left| {y_{2} - y_{1} } \right|} \right)} \right] \\ c_{p2} & = 0.5\left[ {\left( {y_{1} + y_{2} } \right) + \gamma \left( {\left| {y_{2} - y_{1} } \right|} \right)} \right] \\ \end{aligned}$$
(13)
Each variable is chosen with a probability \(p_{c}\) and the above SBX operator is applied variable-by-variable.
Polynomial mutation operator
A polynomial probability distribution is used to create a child population in the vicinity of a parent population under the mutation operator. The following procedure is used:
- 1.
Create a random number u between 0 and 1.
- 2.
Calculate the parameter \(\delta\) as follows:
$$\delta = \left\{ {\begin{array}{*{20}ll} {\left[ {2u + \left( {1 - 2u} \right)\left( {1 - \varphi } \right)^{{\left( {\eta_{m} + 1} \right)}} } \right]^{{\frac{1}{{\left( {\eta_{m} + 1} \right)}}}} - 1,} & \quad {{\text{if}}\;u \le 0.5} \\ {1 - \left[ {2\left( {1 - u} \right) + 2\left( {u - 0.5} \right)\left( {1 - \varphi } \right)^{{\left( {\eta_{m} + 1} \right)}} } \right]^{{\frac{1}{{\left( {\eta_{m} + 1} \right)}}}} ,} & \quad {\text{otherwise}}, \\ \end{array} } \right.$$
(14)
where \(\phi = \frac{{\text{min} \left[ {\left( {c_{p} - y_{l} } \right),\left( {y_{u} - c_{p} } \right)} \right]}}{{\left( {y_{u} - y_{l} } \right)}}\)
The parameter \(\eta_{m}\) is the distribution index for mutation and takes any non-negative value.
- 3.
Calculate the mutated child as follows:
$$\begin{aligned} c_{1} & = c_{p1} + \delta \left( {y_{u} - y_{l} } \right) \\ c_{2} & = c_{p2} + \delta \left( {y_{u} - y_{l} } \right) \\ \end{aligned}$$
The perturbance in the population can be adjusted by varying \(\eta_{m}\) and \(p_{m}\) with generations as given below:
$$\eta_{m} = \eta_{m\text{min} } + {\text{gen}}$$
(15)
$$p_{m} = \frac{1}{D} + \frac{\text{gen}}{{{\text{gen}}_{\text{max} } }}\left( {1 - \frac{1}{D}} \right),$$
(16)
where \(\eta_{m\text{min} }\) is the user-defined minimum value for \(\eta_{m}\), \(p_{m}\) is the probability of mutation, and \(D\) is the number of decision variables.
Simulated annealing
Simulated annealing [25, 26] is a powerful optimization technique which exploits the resemblance between a minimization process and the cooling of molten metal. The physical annealing process is simulated in the simulated annealing (SA) technique for the determination of global or near-global optimum solutions for optimization problems. In this algorithm, a parameter \(T_{0}\), called temperature, is defined. Starting from a high temperature, a molten metal is cooled slowly until it is solidified at a low temperature. The iteration number in the SA technique is analogous to the temperature level. During each iteration, a candidate solution is generated. If this solution is a better solution, it will be accepted and used to generate yet another candidate solution. If it is a deteriorated solution, the solution will be accepted when its probability of acceptance \({\rm P}r\left( \Delta \right)\) as given by Eq. (17) is greater than a randomly generated number between 0 and 1:
$${\rm P}r\left( \Delta \right) = {1 \mathord{\left/ {\vphantom {1 {\left( {1 + \exp \left( {{\Delta \mathord{\left/ {\vphantom {\Delta {T_{v} }}} \right. \kern-0pt} {T_{v} }}} \right)} \right)}}} \right. \kern-0pt} {\left( {1 + \exp \left( {{\Delta \mathord{\left/ {\vphantom {\Delta {T_{v} }}} \right. \kern-0pt} {T_{v} }}} \right)} \right)}},$$
(17)
where \(\Delta\) is the amount of deterioration between the new and the current solutions, and \(T_{v}\) is the temperature at which the new solution is generated. Accepting deteriorated solutions in the above manner enables the SA solution to ‘jump’ out of the local optimum solution points and to seek the global optimum solution. In forming the new solution, the current solution is perturbed [28] according to the Gaussian probability distribution function (GPDF). The mean of the GPDF is taken to be the current solution, and its standard deviation is given by the product of the temperature and a scaling factor \(\sigma\). The value of \(\sigma\) is less than one, and together with the value of temperature, it governs the size of the neighborhood space of the current solution and hence the amount of perturbation. The amount of perturbation is dependent upon the temperature when \(\sigma\) is kept at a constant value. In each iteration, the procedure for generating and testing the candidate solution is repeated for a specified number of trials, so that thermal equilibrium is reached for each temperature. The last accepted candidate solution is then taken as the starting solution for the generation of candidate solutions in the next iteration. Simulated annealing with a slow cooling schedule usually has larger capacity to find the optimal solution than that of a fast cooling schedule. The reduction of the temperature in successive iterations is governed by the following geometric function [25]:
$$T_{v} = r^{{\left( {v - 1} \right)}} T_{0} ,$$
(18)
where \(v\) is the iteration number and \(r\) is temperature reduction factor. \(T_{0}\) is the initial temperature, the value of which can be set arbitrarily or estimated using the method described in Ref. [25]. The iterative process is terminated when there is no significant improvement in the solution after a prespecified number of iterations. It can also be terminated when the maximum number of iterations is reached.