**By R. Belmans**

### Simulated annealing

**The technical process of annealing solids** inspires simulated annealing algorithms. A slow and controlled cooling of a heated solid ensures proper solidification (highly ordered crystalline structure), the state of the minimal internal energy. Rapid cooling causes defects; the system is not in a state of a thermal equilibrium (Rao w). In terms of the optimisation theory, a sequence of steps can be formulated, defining a minimisation process by random changes of the design variables and a probability acceptance criterion for them. An evaluated design vector is accepted or rejected following the Metropolis criterion (Metropolis et al. n):

**• accept design if **

• otherwise accept with the probability

with the scaling factorcalled the Boltzmann constant and T the temperature. Algorithms applying the above probability distribution are called Boltzmann annealing algorithms. It has been proven (Gernan & Geman41) that such algorithms are guaranteed to find the global optimum if the reduction of the temperature is taken to be not faster than:

withthe temperature at iteration k. Applying such a logarithmic cooling schedule results in practice in an infinite lasting optimisation process. Therefore, other cooling schedules are chosen, the majority being based on experimental studies for selected types of problems, rather than being mathematically derived (Ingber 37, Hajek 4 ). These faster schedules are sometimes called simulated quenching to express the fact that they do not satisfy the sufficiency condition (7.7) to converge to the global minimum. However, they are usually a good trade-off between a fast convergence and high global convergence probability. The most common temperature schedule is a simulated quenching algorithm with the exponential cooling schedule:

**However,** the optimal choice of the annealing factor c is not obvious and depends on the problem type. Typical values vary between 0.98 -0.80, with decreasing probability of finding a global optimum in non-convex feasible spaces.

**The simulated annealing starts at a high initial temperature**It follows the evaluation of a sequence of design vectors, either purely randomly generated or following a scheme, as for instance the evolution strategy. This is continued until equilibrium is reached: the average value of/converges towards a stable value (e.g. by means of an error bound on the standard deviation of f) as / increases. During such a period of constant temperature, a step length vector 8 is adjusted periodically. The best design vector is stored. If equilibrium is reached, the temperature is lowered and the above sequence is repeated starting from the active optimal point. This process is stopped if a sufficiently low temperature is reached and no further improvement of the objective function can be expected.

**Except for the definition of the scaling factor k and the scheme** to lower the temperature, also the formulation of the probability to accept or reject the evaluated designs have undergone investigations in the past few years in order to accelerate the algorithm by ‘maintaining the property of global convergence. Codes such as Adaptive Simulated Annealing must be named here (Ingber "). From the viewpoint here, however, it must be stated that these improvements of the global convergence are of a theoretical nature and have a significant effect only on very large dimensional optimisation problems (100 and more design variables). Therefore, the simulated quenching methods are preferred here.

**The advantages of simulated annealing are:**

**• simple implementation **

**• slow cooling guarantees high probability of global convergence, **

**• possibility of solving mixed integer, discrete or continuous problems **

**■ insensitive to stochastic disturbances of the objective function **

**• simple combination with other direct search algorithms.**

**The disadvantages are:**

**• no optimal scheme to determine a sufficiently high****without knowledge of the feasible region **

**• high number of function evaluations.**

**The implemented version of the simulated annealing algorithm** tries to tackle one of the disadvantages. By adjusting the step length vector 8 of the parameter variation automatically in such a way that approximately 50% of all designs will be accepted during one temperature step, the first disadvantage is relaxed. During each temperature step, a number of j step-length adjustment cycles are performed. Each cycle with constant step length consists of //-design variations, changing one design parameter i at a time. A new design is generated by applying a random change (based on the step length vector 5) to the previously accepted design vector

**with****a uniformly distributed random number of the interval**** The adjustment of the step length vector per parameter i in (7.9) is based on the design acceptance ratio r per cycle:**

**with****the number of accepted designs based on changes of variable i and****the predefined number of cycles. The step length****is adjusted after each cycle to:**

with a the step length factor, usually equal to 2.0.

**To illustrate the performance** of the implemented simulated annealing algorithm, the optimisation problem (7.5), (7.6) is solved. The strategy parameters of the SA are chosen as: step length factor two iterations with two step length adjustment cycles before temperature reduction (trials per temperature step). The optimisation is terminated if the difference of the best function value of the present temperature step to the best overall value is less then and this is true for two successive temperature steps. A typical optimisation history is shown in Fig. 7.7. The optimisation was repeated thirty times, and the algorithm always converged to the global minimum inrequiring typically 600-850 function evaluations.

**Fig. 7.7. Path of the connected best trials per temperature step. The path illustrates one strength of the algorithm, as it is able to escape from the local optimum 02 that is reached first.**

**A parallel implementation of SA is possible,** but not attempted here, as the implemented SA mainly serves as a comparison tool. Furthermore, a simple parallelisation of the above implementation of the algorithm is not likely to be efficient, as reported in (Ingber "). A better efficiency is reported for clustered algorithms, where an annealing is performed at different temperatures per computational node.

### Self-adaptive evolution strategy

**Evolution strategies imitate in a simplified** way biological mutation and selection. Evolution strategies slowly started to become popular with the work of Rechenberg 95 and Schwefel 10°. Especially during the past decade, accelerated by the rapid increase of the available computer power, a vast number of developments have been reported (Back et al. 5). In contrast to genetic algorithms, evolution strategies are directly based on real valued vectors when dealing with continuous parameter optimisation problems. While Rechenberg 95 developed a theory for the simple two member (single recombinant) evolution scheme, the theoretic framework of multimember multirecombinant (also called multiparent) evolution strategies, including self-adaptation, is still weak. Back, Hammel and Schwefel comment on this in Back et al.5: "We know that they work, but we do not know why". However, first attempts have been made (Beyer ’12). Even though in these publications the superior progress rate of multirecombinant evolution strategies is highlighted, the single recombinant schemes are still present in engineering research. This is mainly due to their simple implementation, but they are certainly no longer justified when seeking a high performance algorithm.

**The implemented evolution** strategy is basically the multirecombinant scheme developed in (Schwefel |0°). An evolution scheme is based on a population of designs of sizeThe members of this population are created by recombination and mutation from a set of theparent design vectorsThese parent design vectors are selected following their fitness from theoffsprings of the previous iteration. Instead of randomly selecting only one of these parents to generate one offspring, multirecombinant schemes are based on the contribution ofparent vectors. The factoris therefore termed the sexuality. Our experiments and recent publications (Beyer ") have indicated that a sexualitypromises the highest progress rate of the algorithm. Mutation is independently applied to each design vector element as:

with k the generation index, p the parent index, m denoting a population member index of the intervalthe random search direction.

The parental designs can be selected either from the present population onlyor from the population and the previous parent setWhile in the comma strategies, individual parameter sets are mortal (existing only for one generation), they may survive for the whole optimisation in a plus strategy. Plus strategies are more likely to get trapped in local optima. On the other hand, they feature a higher progress rate (Beyer 12). Using an intermediate recombination scheme, the step lengthfor each population member is constructed on the basis ofstep lengths of theparents. As indicated above, it is advisable to choose

witha uniform distributed random integer from the interval being newly generated for each j.

Furthermore, instead of constructing an individual step length for each design parameter, only one step length is used for the whole set. This requires a normalisation of the parameter. Tests have shown that this single step length speeds up the algorithm, provided there are no strong dependencies among the object parameters. It must be noted, that (7.13) actually defines this evolution scheme to be self-adaptive of (stands foradaptive, withthe mutation strength) as defined in Beyer12. In self adaptive schemes, the mutation strengthas a strategy parameter is individually coupled with each set of object parameters. Thus the individual step length is stored with the object parameters. To evolve through the course of the optimisation, an additional learning parameter is specified. It determines how quickly and accurately the self adaptation is performed. Beyer12 reports this self adaptive scheme to achieve optimal convergence velocity for (if the mutation strength is mutated following either a lognormal distribution or a symmetrical two-point distribution. Here, the intermediate step length (mutation strength) is adjusted following a two-point distribution:

withsampling from the random uniformdistribution and a the step length adjustment factor. The optimal adjustment factor a is problem dependent, usually chosen between 1 and 1.7 with 1.2-1.3 being a good choice for problematic multiminima problems. The adjustment by (7.14) helps in exploring the feasible space in two distinct directions: locally by narrowing the search space for approximately 50% of the offsprings and globally by widening it for the remaining sets. The latter by enforcing the final step length to follow the Maxwell distribution. This makes larger step lengths more likely than smaller ones, and should avoid trapping in local optima.

**As already indicated above**, the choice of the strategy parameters is problem dependent, Based on a large number of tests, involving numerous technical optimisation problems, a possible choice of the strategy parameters is collected in Fig. 7.8. It should be clear, however, that these are empirical data and not necessarily the optimal choice for a particular optimisation problem, nor do they indicate the bounding lines any limits. The choice of the number of parents depending on the population size for optimal progress is supported by the data given in Beyer ".

**Fig. 7.8. Typical choice of the strategy parameters for the multirecombinant evolution strategy (population size X related to the dimensionality of the optimisation problem N, and number of parents related to population size). These empirical data are valid only for typical technical problems, featuring locally smooth, non-convex feasible spaces.**

**Selecting the strategy parameter according** to Fig. 7.8 must be accompanied by an appropriate initial step length. The initial step length should be chosen to be larger than the Euclidean distance of the object parameter space. If the design parameters are normalised to the interval this isIf one assumes a well-conditioned optimisation problem, where small changes to the object parameters cause small changes in the value of the objective function, then the change of the step length can be taken as a stopping criterion.

**To illustrate the global convergence of the** self-adaptive evolution scheme, a sequence of optimisations with different strategies (30 runs each) are performed on the optimisation problem (7.5),(7.6). While the (2/2,8)-strategy globally converges in only 37% of the tests, a (4/4,15)-strategy does so in 95% of the runs. For this theoretical model, the step length adjustment factor a is 1.6.

**Fig. 7.9. Convergence history {best offspring per iteration) for a successful run with (2/2+8)-strategy.**

**Fig. 7.10. Convergence history of a successful but slow test with a (2/2+8) strategy. The algorithm is trapped in a local optimum (04) for a considerable number of iterations.**

**Fig. 7.11. Convergence history of a successful test with a (2/2,8)-strategy. This strategy globally converges in only 37% of the test runs.**

**Fig. 7.12. Convergence history of a test with a (4/4,15)-strategy. This strategy globally converged in 97% of the test runs.**

The evolution strategies are well suited for parallel implementation, as all individuals in the population can be constructed and evaluated independently. One important property of the evolution strategy must be pointed out, as it forms the base of a sequencing of these algorithms with other algorithms: the evolution strategies converge very fast during the initial steps of an optimisation but they are poor in fine tuning the parameter set. Such an algorithm is well suited for finding the most interesting region of the feasible space at first, being followed by a faster local algorithm to fine-tune the parameter set.