CS44 W00: Lecture 6

Topics:

Genetic Algorithms

Genetic Algorithms

Genetic algorithms offer an intelligent exploration of a random search. The metaphore for GAs comes from the analogy between representing a complex structure by means of a vector of components and the chromozome make of biological creatures. In biological breeding, the characteristics of offsprings are determined at the genetic level by the way chromozomes combine. Offsprings with desirable characteristics are sought. In a similar way, in seeking better solutions to complex problems, pieces of existing solutions may be combined.

The component vector for a GA applocation is a string (of 0s and 1s, though this string may often have different values.) The genetic operators are selection, crossover, and mutation.

To find the optimal solution to a problem, a GA works by maintaining a population of strings (or chromozomes or potential parents) whose fitness values can be calculated. Each string encodes a solution to the problem and the fitness value is the cost of the solution. A generation of strings evolves to a new generation by selecting parents (one parent is selected according to its fitmess value and one parent is selected randomly). Parents are mated by selecting a crossover point and swapping the string contents about that point.

Example: Suppose you want to open a hamburger restaurant. There are three attributes based on which you can make a business decision. The goal is to make the most profitable decision. The attributes are price (high or low), drink (wine or coke), and speed of service (fast or slow). You can encode this probelm by a 3-bit string where each bit corresponds to the values of one of the attributes. So, for example 011 encodes a restaurant with high prices, coke drinks, and fast service. You can use the decimal value of the string as a fitness measure. For our example the fitness f(011) = 3. You can establish an initial population by selecting randomly a set of individuals, for instance 011, 001, 110, 101. For each of these individuals you can establish the fitness value (for our example, 3, 1, 6, 2) and based on that you can establish the best fit of the generation, the worst fit and the average fitness. This information can be used to select a parent pool. For our exampl, the total fitness is 12, individual 011 contributes 3/12 = 1/4 to the total fitness so in a population of 4, we expect 1 such individual so we select one copy of 011 for reproduction. Reproduction happens by crossover and mutation.

There are several decisions before running a GA:

determine a representation scheme
determine a fitness measure
determine parameters for controlling the algorithm
determine a termination criterion

The GA works as follows:

select a random initial population of individuals (fixed-length strings)
Repeat
1. evaluate fitness for each individual
2. create new population by creating mating pool, doing crossover, and mutation
until stopping condition

The best (according to the fitness measure) individual is the result of the GA.

This material was not covered in class but you should read about it anyway.The first genetic algorithms are due to Holland. Holland's theory uses schemas. A schema is a string for which some bits are fixed and som can be anything. For instance, in a binary representation the schema 111***000 says that all individuals have the first three bits equal to 1, the last three bits equal to 0, and the middle three bits can be any combination of 0 and 1. Schemas can be thought of as defining hyperplanes in an n-dimensional space. The length of a schema is the distance between the first and last defined positions.

Holland's original reproductive plan works as follows: (1) choose one parent according to fitness; choose next parent at random; (3) perform crossover by randomly choosing the crossover point; (4) choose a random member of the population and replace it by the offspring. The following results quantify the performance of this plan.

Theorem: under a reproductive plan in which a parent is selected in proportion to its fitness the expected number of instances of a schema S at time t+1 is E(S, t+1) = f(S, t) * N(S, t), where f(S, t) is the fitness ratio for S and N(S, t) is the number of instances of S.

Theorem: if crossover is applied at time t with probability Pc to a schema S of length l(S), the probability that S will be represented in the population at time t+1 is bounded below by

Pr( S, t+1) >= 1 - Pc * l(S)/(n - 1) * (1 - Pr(S, t))

Theorem: The representation of S in the population will increase provided that f(S, t) >= 1 + l(S)/(n-1) + Pm * k(S), where Pm is the mutation probability and k(S) is the order of the schema.

Switch to:

vasilis@cs.dartmouth.edu