Methods (Electrical Machine) Part 2

By

Simulated annealing

The technical process of annealing solids inspires simulated annealing algorithms. A slow and controlled cooling of a heated solid ensures proper solidification (highly ordered crystalline structure), the state of the minimal internal energy. Rapid cooling causes defects; the system is not in a state of a thermal equilibrium (Rao w). In terms of the optimisation theory, a sequence of steps can be formulated, defining a minimisation process by random changes of the design variables and a probability acceptance criterion for them. An evaluated design vector is accepted or rejected following the Metropolis criterion (Metropolis et al. n):

• accept design if

tmpF-137_thumb_thumb

• otherwise accept with the probability

 

tmpF-138_thumb_thumb with the scaling factortmpF-139_thumb_thumbcalled the Boltzmann constant and T the temperature. Algorithms applying the above probability distribution are called Boltzmann annealing algorithms. It has been proven (Gernan & Geman41) that such algorithms are guaranteed to find the global optimum if the reduction of the temperature is taken to be not faster than:

tmpF-141_thumb_thumb

withtmpF-143_thumb_thumb[1]the temperature at iteration k. Applying such a logarithmic cooling schedule results in practice in an infinite lasting optimisation process. Therefore, other cooling schedules are chosen, the majority being based on experimental studies for selected types of problems, rather than being mathematically derived (Ingber 37, Hajek 4 ). These faster schedules are sometimes called simulated quenching to express the fact that they do not satisfy the sufficiency condition (7.7) to converge to the global minimum. However, they are usually a good trade-off between a fast convergence and high global convergence probability. The most common temperature schedule is a simulated quenching algorithm with the exponential cooling schedule:

tmpF-145_thumb_thumb

However, the optimal choice of the annealing factor c is not obvious and depends on the problem type. Typical values vary between 0.98 -0.80, with decreasing probability of finding a global optimum in non-convex feasible spaces.

The simulated annealing starts at a high initial temperaturetmpF-146_thumb_thumbIt follows the evaluation of a sequence of design vectors, either purely randomly generated or following a scheme, as for instance the evolution strategy. This is continued until equilibrium is reached: the average value of/converges towards a stable value (e.g. by means of an error bound on the standard deviation of f) as / increases. During such a period of constant temperature, a step length vector 8 is adjusted periodically. The best design vector is stored. If equilibrium is reached, the temperature is lowered and the above sequence is repeated starting from the active optimal point. This process is stopped if a sufficiently low temperature is reached and no further improvement of the objective function can be expected.

Except for the definition of the scaling factor k and the scheme to lower the temperature, also the formulation of the probability to accept or reject the evaluated designs have undergone investigations in the past few years in order to accelerate the algorithm by ‘maintaining the property of global convergence. Codes such as Adaptive Simulated Annealing must be named here (Ingber "). From the viewpoint here, however, it must be stated that these improvements of the global convergence are of a theoretical nature and have a significant effect only on very large dimensional optimisation problems (100 and more design variables). Therefore, the simulated quenching methods are preferred here.

The advantages of simulated annealing are:

• simple implementation

• slow cooling guarantees high probability of global convergence,

• possibility of solving mixed integer, discrete or continuous problems

■ insensitive to stochastic disturbances of the objective function

• simple combination with other direct search algorithms.

The disadvantages are:

• no optimal scheme to determine a sufficiently hightmpF-148_thumb_thumbwithout knowledge of the feasible region

• high number of function evaluations.

The implemented version of the simulated annealing algorithm tries to tackle one of the disadvantages. By adjusting the step length vector 8 of the parameter variation automatically in such a way that approximately 50% of all designs will be accepted during one temperature step, the first disadvantage is relaxed. During each temperature step, a number of j step-length adjustment cycles are performed. Each cycle with constant step length consists of //-design variations, changing one design parameter i at a time. A new design is generated by applying a random change (based on the step length vector 5) to the previously accepted design vectortmpF-149_thumb_thumb

tmpF-152_thumb_thumb

withtmpF-153_thumb_thumba uniformly distributed random number of the intervaltmpF-154_thumb_thumb The adjustment of the step length vector per parameter i in (7.9) is based on the design acceptance ratio r per cycle:

tmpF-159_thumb_thumb

withtmpF-160_thumb_thumbthe number of accepted designs based on changes of variable i andtmpF-161_thumb_thumbthe predefined number of cycles. The step lengthtmpF-162_thumb_thumbis adjusted after each cycle to:

 

tmpF-166_thumb_thumb

with a the step length factor, usually equal to 2.0.

To illustrate the performance of the implemented simulated annealing algorithm, the optimisation problem (7.5), (7.6) is solved. The strategy parameters of the SA are chosen as: step length factortmpF-170_thumb_thumb tmpF-171_thumb_thumbtwo iterations with two step length adjustment cycles before temperature reduction (tmpF-172_thumb_thumbtrials per temperature step). The optimisation is terminated if the difference of the best function value of the present temperature step to the best overall value is less thentmpF-173_thumb[2] and this is true for two successive temperature steps. A typical optimisation history is shown in Fig. 7.7. The optimisation was repeated thirty times, and the algorithm always converged to the global minimum intmpF-174_thumb[2]requiring typically 600-850 function evaluations.

Path of the connected best trials per temperature step. The path illustrates one strength of the algorithm, as it is able to escape from the local optimum 02 that is reached first.

Fig. 7.7. Path of the connected best trials per temperature step. The path illustrates one strength of the algorithm, as it is able to escape from the local optimum 02 that is reached first.

A parallel implementation of SA is possible, but not attempted here, as the implemented SA mainly serves as a comparison tool. Furthermore, a simple parallelisation of the above implementation of the algorithm is not likely to be efficient, as reported in (Ingber "). A better efficiency is reported for clustered algorithms, where an annealing is performed at different temperatures per computational node.

Self-adaptive evolution strategy

Evolution strategies imitate in a simplified way biological mutation and selection. Evolution strategies slowly started to become popular with the work of Rechenberg 95 and Schwefel 10°. Especially during the past decade, accelerated by the rapid increase of the available computer power, a vast number of developments have been reported (Back et al. 5). In contrast to genetic algorithms, evolution strategies are directly based on real valued vectors when dealing with continuous parameter optimisation problems. While Rechenberg 95 developed a theory for the simple two member (single recombinant) evolution scheme, the theoretic framework of multimember multirecombinant (also called multiparent) evolution strategies, including self-adaptation, is still weak. Back, Hammel and Schwefel comment on this in Back et al.5: "We know that they work, but we do not know why". However, first attempts have been made (Beyer ’12). Even though in these publications the superior progress rate of multirecombinant evolution strategies is highlighted, the single recombinant schemes are still present in engineering research. This is mainly due to their simple implementation, but they are certainly no longer justified when seeking a high performance algorithm.

The implemented evolution strategy is basically the multirecombinant scheme developed in (Schwefel |0°). An evolution scheme is based on a population of designs of sizetmpF-181_thumb[2]The members of this population are created by recombination and mutation from a set of thetmpF-182_thumb[2]parent design vectorstmpF-183_thumb[2]These parent design vectors are selected following their fitness from thetmpF-184_thumb[2]offsprings of the previous iteration. Instead of randomly selecting only one of these parents to generate one offspring, multirecombinant schemes are based on the contribution oftmpF-185_thumb[2]parent vectors. The factortmpF-186_thumb[2]is therefore termed the sexuality. Our experiments and recent publications (Beyer ") have indicated that a sexualitytmpF-187_thumb[2]promises the highest progress rate of the algorithm. Mutation is independently applied to each design vector element as:

tmpF-195_thumb[2]

with k the generation index, p the parent index, m denoting a population member index of the intervaltmpF-196_thumb[2]the random search direction.

The parental designs can be selected either from the present population onlytmpF-197_thumb[2]or from the population and the previous parent settmpF-202_thumb[2]While in the comma strategies, individual parameter sets are mortal (existing only for one generation), they may survive for the whole optimisation in a plus strategy. Plus strategies are more likely to get trapped in local optima. On the other hand, they feature a higher progress rate (Beyer 12). Using an intermediate recombination scheme, the step lengthtmpF-203_thumb[2]for each population member is constructed on the basis oftmpF-204_thumb[2]step lengths of thetmpF-205_thumb[2]parents. As indicated above, it is advisable to choosetmpF-206_thumb[2]

tmpF-212_thumb[2]

withtmpF-213_thumb[2]a uniform distributed random integer from the intervaltmpF-214_thumb[2] being newly generated for each j.

Furthermore, instead of constructing an individual step length for each design parameter, only one step length is used for the whole set. This requires a normalisation of the parameter. Tests have shown that this single step length speeds up the algorithm, provided there are no strong dependencies among the object parameters. It must be noted, that (7.13) actually defines this evolution scheme to be self-adaptive oftmpF-215_thumb[2] (tmpF-216_thumb[2]stands fortmpF-217_thumb[2]adaptive, withtmpF-218_thumb[2]the mutation strength) as defined in Beyer12. In self adaptive schemes, the mutation strengthtmpF-219_thumb[2]as a strategy parameter is individually coupled with each set of object parameters. Thus the individual step length is stored with the object parameters. To evolve through the course of the optimisation, an additional learning parameter is specified. It determines how quickly and accurately the self adaptation is performed. Beyer12 reports this self adaptive scheme to achieve optimal convergence velocity for (tmpF-220_thumb[2]if the mutation strength is mutated following either a lognormal distribution or a symmetrical two-point distribution. Here, the intermediate step length (mutation strength) is adjusted following a two-point distribution:

tmpF-229_thumb[2]

withtmpF-230_thumb[2]sampling from the random uniformtmpF-231_thumb[2]distribution and a the step length adjustment factor. The optimal adjustment factor a is problem dependent, usually chosen between 1 and 1.7 with 1.2-1.3 being a good choice for problematic multiminima problems. The adjustment by (7.14) helps in exploring the feasible space in two distinct directions: locally by narrowing the search space for approximately 50% of the offsprings and globally by widening it for the remaining sets. The latter by enforcing the final step length to follow the Maxwell distribution. This makes larger step lengths more likely than smaller ones, and should avoid trapping in local optima.

As already indicated above, the choice of the strategy parameters is problem dependent, Based on a large number of tests, involving numerous technical optimisation problems, a possible choice of the strategy parameters is collected in Fig. 7.8. It should be clear, however, that these are empirical data and not necessarily the optimal choice for a particular optimisation problem, nor do they indicate the bounding lines any limits. The choice of the number of parents depending on the population size for optimal progress is supported by the data given in Beyer ".

Typical choice of the strategy parameters for the multirecombinant evolution strategy (population size X related to the dimensionality of the optimisation problem N, and number of parents related to population size). These empirical data are valid only for typical technical problems, featuring locally smooth, non-convex feasible spaces.

Fig. 7.8. Typical choice of the strategy parameters for the multirecombinant evolution strategy (population size X related to the dimensionality of the optimisation problem N, and number of parents related to population size). These empirical data are valid only for typical technical problems, featuring locally smooth, non-convex feasible spaces.

Selecting the strategy parameter according to Fig. 7.8 must be accompanied by an appropriate initial step length. The initial step length should be chosen to be larger than the Euclidean distance of the object parameter space. If the design parameters are normalised to the interval tmpF-235_thumb[2]this istmpF-236_thumb[2]If one assumes a well-conditioned optimisation problem, where small changes to the object parameters cause small changes in the value of the objective function, then the change of the step length can be taken as a stopping criterion.

To illustrate the global convergence of the self-adaptive evolution scheme, a sequence of optimisations with different strategies (30 runs each) are performed on the optimisation problem (7.5),(7.6). While the (2/2,8)-strategy globally converges in only 37% of the tests, a (4/4,15)-strategy does so in 95% of the runs. For this theoretical model, the step length adjustment factor a is 1.6.

Convergence history {best offspring per iteration) for a successful run with (2/2+8)-strategy.

Fig. 7.9. Convergence history {best offspring per iteration) for a successful run with (2/2+8)-strategy. 

Convergence history of a successful but slow test with a (2/2+8) strategy. The algorithm is trapped in a local optimum (04) for a considerable number of iterations.

Fig. 7.10. Convergence history of a successful but slow test with a (2/2+8) strategy. The algorithm is trapped in a local optimum (04) for a considerable number of iterations.

Convergence history of a successful test with a (2/2,8)-strategy. This strategy globally converges in only 37% of the test runs.

Fig. 7.11. Convergence history of a successful test with a (2/2,8)-strategy. This strategy globally converges in only 37% of the test runs.

Convergence history of a test with a (4/4,15)-strategy. This strategy globally converged in 97% of the test runs.

Fig. 7.12. Convergence history of a test with a (4/4,15)-strategy. This strategy globally converged in 97% of the test runs.

The evolution strategies are well suited for parallel implementation, as all individuals in the population can be constructed and evaluated independently. One important property of the evolution strategy must be pointed out, as it forms the base of a sequencing of these algorithms with other algorithms: the evolution strategies converge very fast during the initial steps of an optimisation but they are poor in fine tuning the parameter set. Such an algorithm is well suited for finding the most interesting region of the feasible space at first, being followed by a faster local algorithm to fine-tune the parameter set.