{short description of image}



Backgrounds of Genetic Algorithms

Introduction

In order to understand genetic algorithms, it is important to have a basic understanding of natural genetics, which were used as a model for the algorithms. The observation that descendants show similarities with their parents has led to the assumption that parents carry over properties to their children. In 1865, Gregor Mendel, the founder of the theories of genetics, showed how traits from parents are carried over to their descendants in experiments with peas. His work was ignored for a considerable time, and rediscovered in the early 1900's. Later, the mechanism of heritability was used in populations, to improve the traits of domesticated species. This mechanism of improvement of populations by means of genetic principles is used in Genetic Algorithms. They can be used in many types of optimization problems.

Mechanism of heritability

The live organism has the ability to multiply. This propagation process can be both sexual and asexual or vegetative. In asexual multiplication (cloning), a part of the organism is separated, and develops into a new individual (rhizomes in strawberries, bulbs in patatoes, tubers in tulips etc.). The new individuals are genetically identical to the parent. Many higher organisms and animals dispose of sexual propagation. In animals, this process is as follows. The gametes (oocytes and sperm cells) of two individuals are, during fertilization, recombined into a new individual. The newly formed cell is called the zygote. The zygote consists of one cell, and is, in mammals, nurtured in the womb (uterus) of the mother.
The zygote grows, and undergoes a number of cellular cleavages. Initially, the cells remain identical, but because of subtle chemical influences, cells tend to differentiate, and form different organs, making up the individual. The individual which is born or hatched, has the properties of the species towhich the parents belong, and hence the parents have carried over their properties to their offspring.
The individual consists of billions of cells. When viewed under a microscope, it can be seen that the body cell consists of a viscous fluid the cytoplasm. The outside of the cell is a semi-permeable membrane. In the cytoplasm, a large number of particles can be found. The most important one is the nucleus, which is also surrounded by a membrane. In the nucleus, a number of small bars can be found: the chromosomes. These are built of nucleo-proteins. The most important one is DesoxyRiboNucleicAcid (DNA). DNA is considered as the carrier of heritability.

Figure 1. Schematic representation of a cell.

In diploid organisms, the chromosomes in the nuclues are always paired, eg there is always a set of two chromosomes which belong to each other (homologous chromosomes). The number of pairs of chromosomes is different for different species, but constant within a specie. For example, man has 23 homologous chromosomes, cows have 30, horses 32 etc.

Figure 2. Example of a diploid chromosome section with two homologous chromosomes.

In haploid organisms, there is only one single set of unpaired chromosomes. The cells multiply by cleavage. There are two types of cellular cleavage: the mitosis, which is for replication of cells within the body, and the meiosis, which is the cleavage of cells to generate gametes. Because the mitosis is not of importance to understand genetic algorithms, only a description of meiosis is given here.

Figure 3. Example of a haploid chromosome with a single DNA string.

The homologous chromosomes are split accross the length axis of the chromosomes. The chromosomes migrate to the center of the cell. The cell gets two poles, and the separate homologous chromosomes are pulled to either pole of the cell. That way, each half of the cell has a set of single chromosome strings.

Figure 4. The four stages of meiosis. I. Original Cell. II. Chromosomes are to be split. III. Chromosomes are pulled to the two poles. IV. Resulting cell with set of single chromosomes.

During the separation of homologous chromosomes, the chromosomes can break, and be "glued " to the opposite chromosome (crossing over). This occurs very regularly, and it is an important concept also in the genetic algorithms.

Figure 5. Crossing over of DNA strings.

Which chromosome migrates to which half of the cell is a totally random process. Around the chromosomes in the different halves, a membrane is formed, and hence two gametes are created. So each daughter cell now has half of the chromosomes of the original cell. In the second pass of the meiosis, each cell divided again. So four gametes are formed from one cell. In females, one of these four cells remains as the oocyte. In males, all four cells remain as sperm cells. When fertilization occurs, the gametes of both parents are recombined, and the zygote has a double set of homologous chromosomes again. The coding of traits are located on the chromosomes, and the coding of one property is called a gene or allele. This means that in a diploid organism there are two genes for every trait: one originating from each parent. The set of alleles that represents a property of an individual is always on a fixed place on a chromosome, and is called a locus. Both alleles on the same locus influence the same trait, for example hair or eye color in animals.

Figure 6. Alleles on two loci on a chromosome. A and B represent the alleles.

Heritability of qualitative traits

Based on the above, the heritability of different traits of parents to children can be described. The simplest situation is, that a trait is coded on one locus. For example, hair color of cattle. There are two colors: black and red. Each of these colors is based on one allele. The allele for black is represented by the upper case letter B. The color red is represented by the lower case letter b. There are three possible combinations of these alleles on homologous chromosomes: BB, Bb and bb. These combinations, are the genotype of the animal, eg the genetic composition. The expression of the genes determines how the animal looks like. This is called the phenotype of the animal. In this case there are two possible phenotypes and three possible genotypes. The allele b will be overriden if the complementary one is B. Therefore, the genotypes Bb and BB can not be distinguished. They both have the black phenotype. In this case, black is dominant over red, or B is dominant over b. Vice versa, red is recessive relative to black. When both alleles are the same, the individual is called homozygote. If they are different, then the individual is called heterozygote.

Figure 7. I. Homozygote dominant. Black. II. Heterozygote. Black. III. Homozygote recessive. Red.

Now, in practice, a lot of traits are not so clearly dominant or recessive, but somehow intermediate. In other words, both alleles on the homologous chromosomes are expressed somehow. Another possibility is that different alleles on the same locus are expressed anyway. In that case, all three possible genotypes are phenotypically different. In that case the term co-dominant is applicable. It is also possible, that loci (plural for locus) for different traits are located on the same chromosome, very close together, and that only fixed combinations of the different alleles on the loci occur. In that case, the traits are linked.
During the phase of nuclear cleavage, the chromosomes are replicated, and exact copies of the original are created. However, it does occur from time to time, that the DNA string is not exactly replicated. In that case, a mutation has occurred. This means, that a new allele has been formed, on the same locus, and that a new genetic code has been developed.Mutations play an important role in the emergence of new traits and species. Mutations can have a dramatic effect on an individual, but in many cases mutations have no significant effect. Alleles can mutate in more that one way. When multiple types of alleles exist, the term multiple allelic series is used. Mutation of one allele to another one may occur multiple times. Some alleles are more sensitive than others. They have different mutation frequencies.
Mutations can be seen as changes in the complex architecture at the locus of a chromosome. The structure of chromosomes is very complicated, and all chromosomes together determine how the individual is. If a change occurs in this structure, usually the functionality of the individual deteriorates. However, in a small number of cases, these changes represent an improved functionality. If a mutation causes a favorable effect on the fitness of an individual, the gene frequency of that mutated allele will be favored in the population (evolution).

Population dynamics

The focus will be shifted from the individual to the population. A population is a group of individuals, from which the individuals mate with other members of the population, but not with individuals outside the group. That way, a population is a genetically closed group. In natural populations, some individuals have a higher chance of reproducing than others. The probability of reproducing depends on the fitness of the individual. Generically, the fitter the individual, the higher the chance of reproducing. For example, animals with a genetic deficiency in their locomotion will have a higher chance to be caught by a predator, and will therefore have a lower probability of mating with another member of the population. This mechanism of favoring the fitter individuals in natural populations is called natural selection. In breeding, man decides what the fittest individuals are. These are the individuals, which man can take best advantage of. These are also the individuals, which are used for reproduction, and they determine what the gene frequencies in the next population are. In demesticated animals and plants, this is called artificial selection. In artificial selection, a proportion of the individuals is excluded from reproduction. The proportion of individuals that is used for reproduction is called the selection proportion. The smaller the proportion, the faster the genetic progress. There is a disadvantage however. When a small selection proportion is chosen, a number of genes in the population is more likely to disappear, and they may be favorable in combination with other genes. It is important to choose the right selection proportion, to balance between the advantages and disadvantages.

Figure 8. Artificial Selection. Subpopulations with the best individuals are used to generate the next generation


Genetic algorithms

Genetic algorithms are search algorithms, based on the mechanics of biological genetics. Genetic algorithms borrow' these principles from nature, to search for optima. In genetic algorithms, an individual is represented by a chromosome. A chromosome is defined as an array of bytes, and every position in the chromosome is an allele. The programmer decides how these "chromosomes" are to be interpreted, and how the fitness of an individual can be evaluated. The individuals with a higher fitness have a higher chance of reproducing than the individuals with a lower fitness. In populations of individuals, each individual is evaluated for its fitness. The key to understanding Genetic Algorithms is, that an individual is a representation of a real world optimization problem, where the "chromosome" represents a solution. This chromosome (In GA's equal to individual) is evaluated for fitness. So every value of every allele can represent a value in a factorial problem. The optimization problem can be represented as:

Total = a1 * X1 + a2 * X2 + ... + an * Xn

The total is the value that needs to be optimized, the a coefficients are weighting factors, and the X's are factors. Traditionally, linear programming is used to calculate the weighting factors to find the optimum solution for the total, for example a cost. The changes occur in a population of individuals by Mutation, Crossing Over and Selection. The fitness of an individual is calculated, and used to determine whether or not it will be used in the next generation (selection). The mechanism of mutation (spontaneous change of an allele) is used, to create new individuals. Also the mechanism of crossing over is used, to create new, genetically different individuals. Because of these two mechanisms, a certain genetic variation within a population is maintained. In every generation, every individual is evaluated for its fitness, and those with the highest fitness, are used to create the next generation. On these individuals, crossing over and mutation are applied, to make sure that new individuals with different values for fitness emerge.

Figure 9. Representation of two chromosomes in a GA.

Figure 10. The same chromosomes as in figure 9, after a mutation.

Figure 11. The same chromosomes as in ingure 9, after crossing over occurred.


The Genetic Algorithm is an algorithm to search for the best individual, eg the individual with the highest fitness. Therefore, the following steps are to be taken:


Two external functions have to be defined in order to make the Genetic Algorithm run. The first one is the MutateFunction. The MutateFunction determines how a mutation takes place, so which value is assigned to a gene. The genetic algorithm makes a call to this function every time when a mutation has to take place. In this function, a number in the range of valid values can be assigned to the gene.
The second one is the FitnessFunction. The FitnessFunction determines how fit an individual is. Depending on the criteria that need to be met, a fitness value is assigned to the individual with a given chromosome. In this function, a call to a simulation routine could be made in order to calculate an outcome value, which then is used as the fitness value. Also, in a diploid Genetic Algorithm, the effect of dominance is to be taken into account. In other words, which of the two genes on the allele is to come to expression.

Properties of the Genetic Algorithm

Both the Haploid and Diploid genetic algorithms have the following properties:

There are no fixed rules of what the best values are for which problem. It is a matter of experimentation to find out how good results can be obtained. The differences between the Haploid and Diploid genetic algorithm are in the datastructures of individuals of the classes, and in the way the Fitness functions are programmed. For a haploid genetic algorithm:

  type

    Allele      = byte;                 {Allele -  bit position }

    Chromosome  = array[1..MaxChromLength] of Allele; { String of bytes }

    Individual  =

      record

        Chrom : Chromosome;             {Genotype - bit string }

        Fitness : real;                 {Objective function value }

        Parent1, Parent2,               {parents and cross point }

        XSite : integer;

        Objective : real;

      end;

    Population  = array[1..MaxPopSize] of Individual;


For a diploid genetic algorithm:

  type

    Allele      = byte;                 {Allele -  bit position }

    Chromosome  = array[1..MaxChromLength] of Allele; { String of bytes }

    ChromPack   = array[1..2] of Chromosome;

    ParentId    = record

      XSite : integer;

      Parent : integer;

    end;

    IdPack      = array[1..2] of ParentId;

    Individual  =

      record

        Chrom : ChromPack;              {Genotype - bit string }

        EChrom : Chromosome;            {expressed chromosome }

        Fitness : real;                 {Objective function value }

        Parents : IdPack;               {parents and cross point }

        X : real;

        XSite : integer;

        Objective : real;

      end;

    Population  = array[1..MaxPopSize] of Individual;



For the Diploid Genetic Algorithm it is important to make sure that the value for the individuals Echrom is assigned in the Fitness function. The fitness function determines which gene of the two chromosomes is used, and that is the value that should be assigned to the according gene of the expression chromosome, Echrom. If this is omitted, it will be impossible to recover the value of the individuals expressed genes, unless it is worked out from the individuals Chromosome (of type ChromPack).

Building the Traveling Salesman program (Project TravSal)

In this section you will be guided through the process of building a program, that can handle the traveling salesman problem. There are 5 cities, that need to be visited, and the program has to find the shortest possible route. In order to do this, you have to open a new project. Save the project with a desired name, for example TravSal, and the form as TSP. First drop a haploid genetic algorithm on the form. In the object inspector, you can see a number of values for the genetic parameters. They can all be left as they are, except the chromosome length. Because we are dealing with 5 cities, we set the chromosome length to 5. In the program, the value for each allele on the chromosome will represent the index number of a city. We then drop a button on the form. Also, to display the results, drop a memo on the form. We decide, that every value of a gene represents an index value of a city, as defined in the array Citynames. This means that every number of 1 to 5 may only occur once in the chromosome. If they occur more often, we will have the fitness function discard these values. The chromosome that represents the shortest total distance between cities willbe considered as the fittest. Go to the Object pascal editor, and add the necessary datastructures, as shown below.

type

  City = (Amsterdam, Rotterdam, TheHague,

          Utrecht, Nijmegen);

const

  CityNames : array[1..5] of string =

         ('Amsterdam', 'Rotterdam', 'The Hague',

          'Utrecht', 'Nijmegen');

var

  DistanceTable : array[1..5,1..5] of word;

Move the cursor to the form, and press F11 to go to the object inspector. Select the events' tab, and double click on OnActivate', so that the cursor goes to the Object pascal editor. The procedure FormActivate will be used to create a distance table of the five cities. In the example you can see how that is done.

procedure TForm1.FormActivate(Sender: TObject);

  procedure SetDistanceTable(Row, Col : City; Value : word);

    var

      R, C : word;

    begin

      R := word(Row) + 1;

      C := word(Col) + 1;

      DistanceTable[R, C] := Value;

      DistanceTable[C, R] := Value;

    end;

begin

  SetDistanceTable(Amsterdam, Amsterdam, 0);

  SetDistanceTable(Amsterdam, Rotterdam, 73);

  SetDistanceTable(Amsterdam, TheHague, 84);

  SetDistanceTable(Amsterdam, Utrecht, 37);

  SetDistanceTable(Amsterdam, Nijmegen, 122);

  SetDistancetable(Rotterdam, Rotterdam, 0);

  SetDistancetable(Rotterdam, TheHague, 21);

  SetDistancetable(Rotterdam, Utrecht, 57);

  SetDistancetable(Rotterdam, Nijmegen, 114);

  SetDistanceTable(TheHague, TheHague, 0);

  SetDistanceTable(TheHague, Utrecht, 62);

  SetDistanceTable(TheHague, Nijmegen, 135);

  SetDistancetable(Utrecht, Utrecht, 0);

  SetDistanceTable(Utrecht, Nijmegen, 85);

end;


Then, go back to the form, and double click on the button. The cursor goes to the object pascal editor, to program the event handler for the button click. The code below shows how that should be done.

procedure TForm1.RunButtonClick(Sender: TObject);

var

  TmpStr : string;

  Index : byte;

begin

  HaploidGeneticAlgorithm1.

       FitnessFunction := TotalDistance;

  HaploidGeneticAlgorithm1.

       MutateFunction := ChangeGene;

  HaploidGeneticAlgorithm1.

       ChromosomeLength := 5;

  HaploidGeneticAlgorithm1.Execute;

  Memo1.Clear;

  with HaploidGeneticAlgorithm1 do

    begin

      Index := FittestEver[1];

      Memo1.Lines.Add(Citynames[Index]);

      Index := FittestEver[2];

      Memo1.Lines.Add(Citynames[Index]);

      Index := FittestEver[3];

      Memo1.Lines.Add(Citynames[Index]);

      Index := FittestEver[4];

      Memo1.Lines.Add(Citynames[Index]);

      Index := FittestEver[5];

      Memo1.Lines.Add(Citynames[Index]);

      Str(-HighestEverFitness : 2 : 0, TmpStr);

      TmpStr := 'Total Distance ' + TmpStr;

      Memo1.Lines.Add(TmpStr);

    end;

end;



There are three calls to the haploidGenetic Algorithm. The first two to set the Fitness and Mutation function, the third to actually execute the genetic algorithm. After the algorithm has run, we want to display the results. The order in which the cities appear in the chromosome of the fittest individual found, needs to be displayed in the memo. Also, the total distance is shown in the memo. The only thing that hasn't been done yet is the programming of the Mutation function and the fitness function. Let's do the Mutation function first. Because there are only 5 cities, we need a random value between 1 and 5. The Fitness function will have to evaluate first, if all cities are visited. If there is one missing, there must also be one that is visited twice. In that case, a very low fitness value is assigned. If all city numbers on the chromosome are unique, then we proceed, and count the total traveling distance. The negative of this value will be considered as the individuals fitness.

{$F+}

function ChangeGene(AllelePos : word;

                    OldValue : byte) : Allele;

  begin

    Result := 1 + Random(5);

  end;



function TotalDistance(var Indiv : Individual) : real;

  var

    Loop : byte;

    Loop1 : byte;

    Value : byte;

    Unique : boolean;

    Distance : word;

  begin

    Unique := TRUE;                     {first check if all city numbers are }

    for Loop := 1 to 5 do               {unique. If not, then the individual is}

      begin                             {unfit }

        Value := Indiv.Chrom[Loop];

        for Loop1 := 1 to 5 do

          if (Loop <> Loop1) and

             (Indiv.Chrom[Loop1] = Value) then

            Unique := FALSE;

      end;

    if not Unique then                  {if one city occurs twice or more then }

      Result := -9999                   {set fitness value to very low }

    else                                {otherwise, calculate the negative of }

      begin                             {the total distance }

        Distance := 0;                  {initialize distance }

        for Loop := 1 to 4 do           {add distances between cities }

          Distance := Distance +

                         DistanceTable[Indiv.Chrom[Loop],

                                    Indiv.Chrom[Loop + 1]];

        Distance := Distance +          {for the trip back to start city }

                         DistanceTable[Indiv.Chrom[5],

                                       Indiv.Chrom[1]];

        Result := -Distance;

      end;

  end;

{$F-}



Example of Translation of Chromosome to feed formulation problem (Project SimpFd)

The following example is used to make clear how we can translate the chromosome values to a real-world problem. Assume we have two feeds, each with a value for protein, carbohydrate and fat. We have to feed an individual, that needs to minimally have 60 grams of protein, 24 grams of fat and 30 grams of carbohydrate. Feed A contains per kg 20 grams of protein, 12 grams of fat and 30 grams of carbohydrates. Feed B contains 30 grams of protein, 6 grams of fat and 15 grams of carbohydrate. Feed A costs 60 cents per kg, feed B costs 40 cents per kg. What is the cheapest way to feed this individual, while meeting his/her nutritional requirements? The problem can be represented by Cost = a1 * Feed1 + a2 * Feed2 In this example, we can have a chromosome with two alleles, one for feed A and one for feed B. Each allele represents a quantity of the feed, and can have values between 0 and 100. Every increment represents 0.1 kg of the feed, so in reality we can range both feeds between 0 and 10 kg. The MutateFunction assigns this value to a gene. The fitness of an individual is determined by the cost of the ration. Because the cost has to be minimised, the negative value of cost is assigned to Fitness.The ration has to meet the requirements. If the requirements aren't met, a very low value is assigned to the fitness. If the requirements are met, the cost of the diet is calculated, and returned as the fitness value.

unit FdForm;



interface



uses

  Windows, Messages, SysUtils, Classes, Controls, Forms, Dialogs,

  StdCtrls, Mask, TeEngine, ExtCtrls, GenAlg;



type

  TForm1 = class(TForm)

    ExecuteButton: TButton;

    ResultLabel: TLabel;

    HaploidGeneticAlgorithm1: THaploidGeneticAlgorithm;

    FeedALabel: TLabel;

    FeedBLabel: TLabel;

    procedure ExecuteButtonClick(Sender: TObject);

    procedure FormActivate(Sender: TObject);

  private

    { Private declarations }

  public

    { Public declarations }

  end;



var

  Form1: TForm1;



implementation



{$R *.DFM}

function MyFitnessFunction(var Indiv : Individual) : real;

  var

    minReqProtein : real;

    MinReqFat : real;

    MinReqCarbohydrate : real;

    Protein : real;

    Fat : real;

    carbohydrate : real;

    ReturnValue : real;

  begin                                 {*** MyFitnessFunction ***}

    MinReqProtein := 0.06;              {minimum requirements in kg }

    MinReqFat := 0.024;

    MinReqCarboHydrate := 0.030;

    Protein := ((Indiv.Chrom[1] * 0.020) +

                (Indiv.Chrom[2] * 0.030)) / 10;

    Fat := ((Indiv.Chrom[1] * 0.012) +

            (Indiv.Chrom[2] * 0.006)) / 10;

    Carbohydrate := ((Indiv.Chrom[1] * 0.030) +

                     (Indiv.Chrom[2] * 0.015)) / 10;

    ReturnValue := - (Indiv.Chrom[1] * 0.60 / 10 +

                      Indiv.Chrom[2] * 0.40 / 10);

    if Protein < MinReqProtein then

      ReturnValue := -5;

    if Fat < MinReqFat then

      ReturnValue := -5;

    if carbohydrate < MinReqCarbohydrate then

      ReturnValue := -5;

    Result := ReturnValue;

  end;                                  {*** MyFitnessFunction ***}



function MyMutationFunction(AllelePos : word;

                            OldValue : byte) : Allele;

  begin                                 {*** MyMutationFunction ***}

    Result := Random(101);

  end;                                  {*** MyMutationFunction ***}





procedure TForm1.ExecuteButtonClick(Sender: TObject);

var

  TmpStr : string;

begin

  with HaploidGeneticAlgorithm1 do

    begin

      PopulationSize := 30;

      ChromosomeLength := 2;

      NumGenerations := 50;

      ProbCrossingOver := 0.1;

      ProbMutation := 0.6;

      SelectionIntensity := 0.3;

    end;

  HaploidGeneticAlgorithm1.Execute;

  Str(- HaploidGeneticAlgorithm1.

          HighestEverFitness : 10 : 3,

      TmpStr);

  ResultLabel.Caption := 'Lowest Cost: ' + TmpStr;

  Str(HaploidGeneticAlgorithm1.FittestEver[1] / 10 : 2 : 2, TmpStr);

  FeedALabel.Caption := 'Feed A: ' + TmpStr + ' kg';

  Str(HaploidGeneticAlgorithm1.FittestEver[2] / 10 : 2 : 2, TmpStr);

  FeedBLabel.Caption := 'Feed B: ' + TmpStr + ' kg';

end;



procedure TForm1.FormActivate(Sender: TObject);

begin

  HaploidGeneticAlgorithm1.FitnessFunction := MyFitnessFunction;

  HaploidGeneticAlgorithm1.MutateFunction := MyMutationFunction;

end;



end.



The project SimpFeed.dpr is the same program, except that it has a graph which shows the development of the ration cost over the generations. Because it contains a Tchart, it can only be used in Delphi 3 Professional. The figures below show the design and runtime screen of this program.



Figure 12. Design form for feed formulation example.


Figure 13. Run-time screen for feed formulation example.