Backgrounds of Genetic Algorithms
In order to understand genetic algorithms, it is important to have a basic
understanding of natural genetics, which were used as a model for the
algorithms. The observation that descendants show similarities with their
parents has led to the assumption that parents carry over properties to their
children. In 1865, Gregor Mendel, the founder of the theories of genetics,
showed how traits from parents are carried over to their descendants in
experiments with peas. His work was ignored for a considerable time, and
rediscovered in the early 1900's. Later, the mechanism of heritability was used
in populations, to improve the traits of domesticated species. This mechanism
of improvement of populations by means of genetic principles is used in Genetic
Algorithms. They can be used in many types of optimization problems.
The live organism has the ability to multiply. This propagation process can
be both sexual and asexual or vegetative. In asexual multiplication (cloning),
a part of the organism is separated, and develops into a new individual
(rhizomes in strawberries, bulbs in patatoes, tubers in tulips etc.). The new
individuals are genetically identical to the parent. Many higher organisms and
animals dispose of sexual propagation. In animals, this process is as follows.
The gametes (oocytes and sperm cells) of two individuals are, during
fertilization, recombined into a new individual. The newly formed cell is
called the zygote. The zygote consists of one cell, and is, in mammals,
nurtured in the womb (uterus) of the mother.
The zygote grows, and undergoes a number of cellular cleavages. Initially, the
cells remain identical, but because of subtle chemical influences, cells tend
to differentiate, and form different organs, making up the individual. The
individual which is born or hatched, has the properties of the species towhich
the parents belong, and hence the parents have carried over their properties to
their offspring.
The individual consists of billions of cells. When viewed under a microscope,
it can be seen that the body cell consists of a viscous fluid the cytoplasm.
The outside of the cell is a semi-permeable membrane. In the cytoplasm, a large
number of particles can be found. The most important one is the nucleus, which
is also surrounded by a membrane. In the nucleus, a number of small bars can be
found: the chromosomes. These are built of nucleo-proteins. The most important
one is DesoxyRiboNucleicAcid (DNA). DNA is considered as the carrier of
heritability.
Figure 1. Schematic
representation of a cell.
In diploid organisms, the chromosomes in the nuclues are always paired, eg
there is always a set of two chromosomes which belong to each other (homologous
chromosomes). The number of pairs of chromosomes is different for different
species, but constant within a specie. For example, man has 23 homologous
chromosomes, cows have 30, horses 32 etc.
Figure 2. Example of a diploid
chromosome section with two homologous chromosomes.
In haploid organisms, there is only one single set of unpaired chromosomes. The
cells multiply by cleavage. There are two types of cellular cleavage: the
mitosis, which is for replication of cells within the body, and the meiosis,
which is the cleavage of cells to generate gametes. Because the mitosis is not
of importance to understand genetic algorithms, only a description of meiosis
is given here.
Figure 3. Example of a haploid
chromosome with a single DNA string.
The homologous chromosomes are split accross the length axis of the
chromosomes. The chromosomes migrate to the center of the cell. The cell gets
two poles, and the separate homologous chromosomes are pulled to either pole of
the cell. That way, each half of the cell has a set of single chromosome
strings.
Figure 4. The four stages of
meiosis. I. Original Cell. II. Chromosomes are to be split. III. Chromosomes
are pulled to the two poles. IV. Resulting cell with set of single
chromosomes.
During the separation of homologous chromosomes, the chromosomes can break, and
be "glued " to the opposite chromosome (crossing over). This occurs
very regularly, and it is an important concept also in the genetic algorithms.
Figure 5. Crossing over of DNA
strings.
Which chromosome migrates to which half of the cell is a totally random
process. Around the chromosomes in the different halves, a membrane is formed,
and hence two gametes are created. So each daughter cell now has half of the
chromosomes of the original cell. In the second pass of the meiosis, each cell
divided again. So four gametes are formed from one cell. In females, one of
these four cells remains as the oocyte. In males, all four cells remain as
sperm cells. When fertilization occurs, the gametes of both parents are
recombined, and the zygote has a double set of homologous chromosomes again.
The coding of traits are located on the chromosomes, and the coding of one
property is called a gene or allele. This means that in a diploid organism
there are two genes for every trait: one originating from each parent. The set
of alleles that represents a property of an individual is always on a fixed
place on a chromosome, and is called a locus. Both alleles on the same locus
influence the same trait, for example hair or eye color in animals.
Figure 6. Alleles on two loci
on a chromosome. A and B represent the alleles.
Based on the above, the heritability of different traits of parents to
children can be described. The simplest situation is, that a trait is coded on
one locus. For example, hair color of cattle. There are two colors: black and
red. Each of these colors is based on one allele. The allele for black is
represented by the upper case letter B. The color red is represented by the
lower case letter b. There are three possible combinations of these alleles on
homologous chromosomes: BB, Bb and bb. These combinations, are the genotype of
the animal, eg the genetic composition. The expression of the genes determines
how the animal looks like. This is called the phenotype of the animal. In this
case there are two possible phenotypes and three possible genotypes. The allele
b will be overriden if the complementary one is B. Therefore, the genotypes Bb
and BB can not be distinguished. They both have the black phenotype. In this
case, black is dominant over red, or B is dominant over b. Vice versa, red is
recessive relative to black. When both alleles are the same, the individual is
called homozygote. If they are different, then the individual is called
heterozygote.
Figure 7. I. Homozygote
dominant. Black. II. Heterozygote. Black. III. Homozygote recessive. Red.
Now, in practice, a lot of traits are not so clearly dominant or recessive, but
somehow intermediate. In other words, both alleles on the homologous
chromosomes are expressed somehow. Another possibility is that different
alleles on the same locus are expressed anyway. In that case, all three
possible genotypes are phenotypically different. In that case the term
co-dominant is applicable. It is also possible, that loci (plural for locus)
for different traits are located on the same chromosome, very close together,
and that only fixed combinations of the different alleles on the loci occur. In
that case, the traits are linked.
During the phase of nuclear cleavage, the chromosomes are replicated, and exact
copies of the original are created. However, it does occur from time to time,
that the DNA string is not exactly replicated. In that case, a mutation has
occurred. This means, that a new allele has been formed, on the same locus, and
that a new genetic code has been developed.Mutations play an important role in
the emergence of new traits and species. Mutations can have a dramatic effect
on an individual, but in many cases mutations have no significant effect.
Alleles can mutate in more that one way. When multiple types of alleles exist,
the term multiple allelic series is used. Mutation of one allele to another one
may occur multiple times. Some alleles are more sensitive than others. They
have different mutation frequencies.
Mutations can be seen as changes in the complex architecture at the locus of a
chromosome. The structure of chromosomes is very complicated, and all
chromosomes together determine how the individual is. If a change occurs in
this structure, usually the functionality of the individual deteriorates.
However, in a small number of cases, these changes represent an improved
functionality. If a mutation causes a favorable effect on the fitness of an
individual, the gene frequency of that mutated allele will be favored in the
population (evolution).
The focus will be shifted from the individual to the population. A population is a group of individuals, from which the individuals mate with other members of the population, but not with individuals outside the group. That way, a population is a genetically closed group. In natural populations, some individuals have a higher chance of reproducing than others. The probability of reproducing depends on the fitness of the individual. Generically, the fitter the individual, the higher the chance of reproducing. For example, animals with a genetic deficiency in their locomotion will have a higher chance to be caught by a predator, and will therefore have a lower probability of mating with another member of the population. This mechanism of favoring the fitter individuals in natural populations is called natural selection. In breeding, man decides what the fittest individuals are. These are the individuals, which man can take best advantage of. These are also the individuals, which are used for reproduction, and they determine what the gene frequencies in the next population are. In demesticated animals and plants, this is called artificial selection. In artificial selection, a proportion of the individuals is excluded from reproduction. The proportion of individuals that is used for reproduction is called the selection proportion. The smaller the proportion, the faster the genetic progress. There is a disadvantage however. When a small selection proportion is chosen, a number of genes in the population is more likely to disappear, and they may be favorable in combination with other genes. It is important to choose the right selection proportion, to balance between the advantages and disadvantages.
Figure 8. Artificial Selection. Subpopulations with the best individuals are used to generate the next generation
Genetic algorithms are search algorithms, based on the mechanics of
biological genetics. Genetic algorithms borrow' these principles from nature,
to search for optima. In genetic algorithms, an individual is represented by a
chromosome. A chromosome is defined as an array of bytes, and every position in
the chromosome is an allele. The programmer decides how these
"chromosomes" are to be interpreted, and how the fitness of an
individual can be evaluated. The individuals with a higher fitness have a
higher chance of reproducing than the individuals with a lower fitness. In
populations of individuals, each individual is evaluated for its fitness. The
key to understanding Genetic Algorithms is, that an individual is a
representation of a real world optimization problem, where the
"chromosome" represents a solution. This chromosome (In GA's equal to
individual) is evaluated for fitness. So every value of every allele can
represent a value in a factorial problem. The optimization problem can be
represented as:
Total = a1 * X1 + a2 * X2 + ...
+ an * Xn
The total is the value that needs to be optimized, the a coefficients are
weighting factors, and the X's are factors. Traditionally, linear programming
is used to calculate the weighting factors to find the optimum solution for the
total, for example a cost. The changes occur in a population of individuals by
Mutation, Crossing Over and Selection. The fitness of an individual is
calculated, and used to determine whether or not it will be used in the next
generation (selection). The mechanism of mutation (spontaneous change of an
allele) is used, to create new individuals. Also the mechanism of crossing over
is used, to create new, genetically different individuals. Because of these two
mechanisms, a certain genetic variation within a population is maintained. In
every generation, every individual is evaluated for its fitness, and those with
the highest fitness, are used to create the next generation. On these
individuals, crossing over and mutation are applied, to make sure that new
individuals with different values for fitness emerge.
Figure 9. Representation of two chromosomes in a GA.
Figure 10. The same chromosomes as in figure 9, after a mutation.
Figure 11. The same chromosomes as in ingure 9, after crossing over occurred.
The Genetic Algorithm is an algorithm to search for the best individual, eg the
individual with the highest fitness. Therefore, the following steps are to be
taken:
Two external functions have to be defined in order to make the Genetic
Algorithm run. The first one is the MutateFunction. The MutateFunction
determines how a mutation takes place, so which value is assigned to a gene.
The genetic algorithm makes a call to this function every time when a mutation
has to take place. In this function, a number in the range of valid values can
be assigned to the gene.
The second one is the FitnessFunction. The FitnessFunction determines how fit
an individual is. Depending on the criteria that need to be met, a fitness
value is assigned to the individual with a given chromosome. In this function,
a call to a simulation routine could be made in order to calculate an outcome
value, which then is used as the fitness value. Also, in a diploid Genetic
Algorithm, the effect of dominance is to be taken into account. In other words,
which of the two genes on the allele is to come to expression.
Both the Haploid and Diploid genetic algorithms have the following properties:
There are no fixed rules of what the best values are for which problem. It is a matter of experimentation to find out how good results can be obtained. The differences between the Haploid and Diploid genetic algorithm are in the datastructures of individuals of the classes, and in the way the Fitness functions are programmed. For a haploid genetic algorithm:
type Allele = byte; {Allele - bit position } Chromosome = array[1..MaxChromLength] of Allele; { String of bytes } Individual = record Chrom : Chromosome; {Genotype - bit string } Fitness : real; {Objective function value } Parent1, Parent2, {parents and cross point } XSite : integer; Objective : real; end; Population = array[1..MaxPopSize] of Individual;
For a diploid genetic algorithm:
type Allele = byte; {Allele - bit position } Chromosome = array[1..MaxChromLength] of Allele; { String of bytes } ChromPack = array[1..2] of Chromosome; ParentId = record XSite : integer; Parent : integer; end; IdPack = array[1..2] of ParentId; Individual = record Chrom : ChromPack; {Genotype - bit string } EChrom : Chromosome; {expressed chromosome } Fitness : real; {Objective function value } Parents : IdPack; {parents and cross point } X : real; XSite : integer; Objective : real; end; Population = array[1..MaxPopSize] of Individual;
For the Diploid Genetic Algorithm it is important to make sure that the
value for the individuals Echrom is assigned in the Fitness function. The
fitness function determines which gene of the two chromosomes is used, and that
is the value that should be assigned to the according gene of the expression
chromosome, Echrom. If this is omitted, it will be impossible to recover the
value of the individuals expressed genes, unless it is worked out from the
individuals Chromosome (of type ChromPack).
In this section you will be guided through the process of building a program, that can handle the traveling salesman problem. There are 5 cities, that need to be visited, and the program has to find the shortest possible route. In order to do this, you have to open a new project. Save the project with a desired name, for example TravSal, and the form as TSP. First drop a haploid genetic algorithm on the form. In the object inspector, you can see a number of values for the genetic parameters. They can all be left as they are, except the chromosome length. Because we are dealing with 5 cities, we set the chromosome length to 5. In the program, the value for each allele on the chromosome will represent the index number of a city. We then drop a button on the form. Also, to display the results, drop a memo on the form. We decide, that every value of a gene represents an index value of a city, as defined in the array Citynames. This means that every number of 1 to 5 may only occur once in the chromosome. If they occur more often, we will have the fitness function discard these values. The chromosome that represents the shortest total distance between cities willbe considered as the fittest. Go to the Object pascal editor, and add the necessary datastructures, as shown below.
type City = (Amsterdam, Rotterdam, TheHague, Utrecht, Nijmegen); const CityNames : array[1..5] of string = ('Amsterdam', 'Rotterdam', 'The Hague', 'Utrecht', 'Nijmegen'); var DistanceTable : array[1..5,1..5] of word;
Move the cursor to the form, and press F11 to go to the object inspector. Select the events' tab, and double click on OnActivate', so that the cursor goes to the Object pascal editor. The procedure FormActivate will be used to create a distance table of the five cities. In the example you can see how that is done.
procedure TForm1.FormActivate(Sender: TObject); procedure SetDistanceTable(Row, Col : City; Value : word); var R, C : word; begin R := word(Row) + 1; C := word(Col) + 1; DistanceTable[R, C] := Value; DistanceTable[C, R] := Value; end; begin SetDistanceTable(Amsterdam, Amsterdam, 0); SetDistanceTable(Amsterdam, Rotterdam, 73); SetDistanceTable(Amsterdam, TheHague, 84); SetDistanceTable(Amsterdam, Utrecht, 37); SetDistanceTable(Amsterdam, Nijmegen, 122); SetDistancetable(Rotterdam, Rotterdam, 0); SetDistancetable(Rotterdam, TheHague, 21); SetDistancetable(Rotterdam, Utrecht, 57); SetDistancetable(Rotterdam, Nijmegen, 114); SetDistanceTable(TheHague, TheHague, 0); SetDistanceTable(TheHague, Utrecht, 62); SetDistanceTable(TheHague, Nijmegen, 135); SetDistancetable(Utrecht, Utrecht, 0); SetDistanceTable(Utrecht, Nijmegen, 85); end;
Then, go back to the form, and double click on the button. The cursor goes to the object pascal editor, to program the event handler for the button click. The code below shows how that should be done.
procedure TForm1.RunButtonClick(Sender: TObject); var TmpStr : string; Index : byte; begin HaploidGeneticAlgorithm1. FitnessFunction := TotalDistance; HaploidGeneticAlgorithm1. MutateFunction := ChangeGene; HaploidGeneticAlgorithm1. ChromosomeLength := 5; HaploidGeneticAlgorithm1.Execute; Memo1.Clear; with HaploidGeneticAlgorithm1 do begin Index := FittestEver[1]; Memo1.Lines.Add(Citynames[Index]); Index := FittestEver[2]; Memo1.Lines.Add(Citynames[Index]); Index := FittestEver[3]; Memo1.Lines.Add(Citynames[Index]); Index := FittestEver[4]; Memo1.Lines.Add(Citynames[Index]); Index := FittestEver[5]; Memo1.Lines.Add(Citynames[Index]); Str(-HighestEverFitness : 2 : 0, TmpStr); TmpStr := 'Total Distance ' + TmpStr; Memo1.Lines.Add(TmpStr); end; end;
There are three calls to the haploidGenetic Algorithm. The first two to set the Fitness and Mutation function, the third to actually execute the genetic algorithm. After the algorithm has run, we want to display the results. The order in which the cities appear in the chromosome of the fittest individual found, needs to be displayed in the memo. Also, the total distance is shown in the memo. The only thing that hasn't been done yet is the programming of the Mutation function and the fitness function. Let's do the Mutation function first. Because there are only 5 cities, we need a random value between 1 and 5. The Fitness function will have to evaluate first, if all cities are visited. If there is one missing, there must also be one that is visited twice. In that case, a very low fitness value is assigned. If all city numbers on the chromosome are unique, then we proceed, and count the total traveling distance. The negative of this value will be considered as the individuals fitness.
{$F+} function ChangeGene(AllelePos : word; OldValue : byte) : Allele; begin Result := 1 + Random(5); end; function TotalDistance(var Indiv : Individual) : real; var Loop : byte; Loop1 : byte; Value : byte; Unique : boolean; Distance : word; begin Unique := TRUE; {first check if all city numbers are } for Loop := 1 to 5 do {unique. If not, then the individual is} begin {unfit } Value := Indiv.Chrom[Loop]; for Loop1 := 1 to 5 do if (Loop <> Loop1) and (Indiv.Chrom[Loop1] = Value) then Unique := FALSE; end; if not Unique then {if one city occurs twice or more then } Result := -9999 {set fitness value to very low } else {otherwise, calculate the negative of } begin {the total distance } Distance := 0; {initialize distance } for Loop := 1 to 4 do {add distances between cities } Distance := Distance + DistanceTable[Indiv.Chrom[Loop], Indiv.Chrom[Loop + 1]]; Distance := Distance + {for the trip back to start city } DistanceTable[Indiv.Chrom[5], Indiv.Chrom[1]]; Result := -Distance; end; end; {$F-}
The following example is used to make clear how we can translate the chromosome values to a real-world problem. Assume we have two feeds, each with a value for protein, carbohydrate and fat. We have to feed an individual, that needs to minimally have 60 grams of protein, 24 grams of fat and 30 grams of carbohydrate. Feed A contains per kg 20 grams of protein, 12 grams of fat and 30 grams of carbohydrates. Feed B contains 30 grams of protein, 6 grams of fat and 15 grams of carbohydrate. Feed A costs 60 cents per kg, feed B costs 40 cents per kg. What is the cheapest way to feed this individual, while meeting his/her nutritional requirements? The problem can be represented by Cost = a1 * Feed1 + a2 * Feed2 In this example, we can have a chromosome with two alleles, one for feed A and one for feed B. Each allele represents a quantity of the feed, and can have values between 0 and 100. Every increment represents 0.1 kg of the feed, so in reality we can range both feeds between 0 and 10 kg. The MutateFunction assigns this value to a gene. The fitness of an individual is determined by the cost of the ration. Because the cost has to be minimised, the negative value of cost is assigned to Fitness.The ration has to meet the requirements. If the requirements aren't met, a very low value is assigned to the fitness. If the requirements are met, the cost of the diet is calculated, and returned as the fitness value.
unit FdForm; interface uses Windows, Messages, SysUtils, Classes, Controls, Forms, Dialogs, StdCtrls, Mask, TeEngine, ExtCtrls, GenAlg; type TForm1 = class(TForm) ExecuteButton: TButton; ResultLabel: TLabel; HaploidGeneticAlgorithm1: THaploidGeneticAlgorithm; FeedALabel: TLabel; FeedBLabel: TLabel; procedure ExecuteButtonClick(Sender: TObject); procedure FormActivate(Sender: TObject); private { Private declarations } public { Public declarations } end; var Form1: TForm1; implementation {$R *.DFM} function MyFitnessFunction(var Indiv : Individual) : real; var minReqProtein : real; MinReqFat : real; MinReqCarbohydrate : real; Protein : real; Fat : real; carbohydrate : real; ReturnValue : real; begin {*** MyFitnessFunction ***} MinReqProtein := 0.06; {minimum requirements in kg } MinReqFat := 0.024; MinReqCarboHydrate := 0.030; Protein := ((Indiv.Chrom[1] * 0.020) + (Indiv.Chrom[2] * 0.030)) / 10; Fat := ((Indiv.Chrom[1] * 0.012) + (Indiv.Chrom[2] * 0.006)) / 10; Carbohydrate := ((Indiv.Chrom[1] * 0.030) + (Indiv.Chrom[2] * 0.015)) / 10; ReturnValue := - (Indiv.Chrom[1] * 0.60 / 10 + Indiv.Chrom[2] * 0.40 / 10); if Protein < MinReqProtein then ReturnValue := -5; if Fat < MinReqFat then ReturnValue := -5; if carbohydrate < MinReqCarbohydrate then ReturnValue := -5; Result := ReturnValue; end; {*** MyFitnessFunction ***} function MyMutationFunction(AllelePos : word; OldValue : byte) : Allele; begin {*** MyMutationFunction ***} Result := Random(101); end; {*** MyMutationFunction ***} procedure TForm1.ExecuteButtonClick(Sender: TObject); var TmpStr : string; begin with HaploidGeneticAlgorithm1 do begin PopulationSize := 30; ChromosomeLength := 2; NumGenerations := 50; ProbCrossingOver := 0.1; ProbMutation := 0.6; SelectionIntensity := 0.3; end; HaploidGeneticAlgorithm1.Execute; Str(- HaploidGeneticAlgorithm1. HighestEverFitness : 10 : 3, TmpStr); ResultLabel.Caption := 'Lowest Cost: ' + TmpStr; Str(HaploidGeneticAlgorithm1.FittestEver[1] / 10 : 2 : 2, TmpStr); FeedALabel.Caption := 'Feed A: ' + TmpStr + ' kg'; Str(HaploidGeneticAlgorithm1.FittestEver[2] / 10 : 2 : 2, TmpStr); FeedBLabel.Caption := 'Feed B: ' + TmpStr + ' kg'; end; procedure TForm1.FormActivate(Sender: TObject); begin HaploidGeneticAlgorithm1.FitnessFunction := MyFitnessFunction; HaploidGeneticAlgorithm1.MutateFunction := MyMutationFunction; end; end.
The project SimpFeed.dpr is the same program, except that it has a graph
which shows the development of the ration cost over the generations. Because it
contains a Tchart, it can only be used in Delphi 3 Professional. The figures
below show the design and runtime screen of this program.
Figure 12. Design form for feed formulation example.
Figure 13. Run-time screen for feed formulation example.