Part 2
As promised, here we are for our second part. In this part, we will address the following topics:
- Terminology
- The No Free Lunch Theorem
- Search
- Genetic Algorithm (GA) Schema/Structure
In this part, we won’t yet dive into hands-on activities. Yes, I know you want to start implementing a GA (and you can; the internet has no limits, seek knowledge), and I promise that in the next part, we will begin implementing our GA.
Terminology
Let’s define some terminologies that we will use extensively throughout this series. You’ll see that because GAs are inspired by natural processes, there is a strong analogy between the terms used in the field of GAs and biological terms.
In natural systems, chromosomes combine to form the various genetic characteristics of individuals. In GAs, we use chromosomes or individuals to represent a candidate solution to our optimization problem. Population is the set of solutions, that is, the set of individuals/chromosomes.
The genes, which chromosomes are made up of, relate to characteristics, the alleles are the values a gene can take, and the locus refers to the position of the gene.
These are the main terms, however, if necessary, when introducing another term later, we will provide a brief description as well, so don’t worry.
The No Free Lunch Theorem
Wolpert makes a statement, known as the No Free Lunch (NFL) theorem, that all search algorithms have the same performance when averaged over all possible infinite problems. It can be stated that if algorithm A is better for a series of k problems than algorithm B, there will be another series of problems where algorithm B is better than algorithm A.
This is true because algorithms make prior assumptions. That is, there is a series of problems where the assumptions adopted by one type of algorithm align with the reality of the problem at hand. If this same algorithm works very well for one problem, it does not imply that it will work for a different problem.
In our case, to have a good GA, we must embed as much knowledge about the problem as possible in the representation, genetic operators, and evaluation function (we will see this later).
Why is this important? We must understand that our decisions should be based on-premises, and the more tools, i.e., the number of algorithms you know, the more precise your premises will be. But, at the end of the day, they are still premises, and they may be wrong, and you may not be using the best tools to solve your problem. There is no free lunch.
Search
We can say that search is the basic problem of computing. This means that every problem aims to achieve a goal, to reach a state where a condition is satisfied.
The state is the entire set of relevant universe states for the problem, or the state that the problem can assume. Let’s look at the example Professor Linden uses in his book: if our goal is to earn a million dollars, my state is how much money I have earned so far.
Actions processed change the system’s state. Still in the cited example, if I sell 1,000 products worth X, then I would be closer to achieving my goal. Reaffirming what was said, solving a problem consists of finding a state where all our conditions are satisfied (search).
The search in state space can be defined by a quadruple {E, A, I, O}:
- E: states that the problem can be assumed;
- A: actions that cause the state to change;
- I: the initial state of the problem;
- O: the target state(s) of the problem.
Besides the GA that we will discuss in this series, other search techniques might interest you. Some are:
- Numerical methods:
- Bisection
- Newton-Raphson
- Constraint problems:
- Simplex
- Quadratic programming
- State space search methods:
- Blind search
- Informed search
- Other intelligent methods:
- Simulated annealing
- Tabu search
GA Schema
GAs process the elements belonging to the search space, these elements being our population and potential candidates for solving our problem. The population evolves over generations, i.e., our candidate solutions are transformed during the iterations. This process repeats until we find an individual of high quality. This individual is our optimal solution.
The initial population is generated randomly, and from this population, the basic mechanisms of natural evolution are applied. These mechanisms/transformations work as follows:
- Selects the fittest individuals to be the progenitors of the next generation. These individuals are selected probabilistically.
- Operators act on these individuals generating new solutions.
A high-level representation of a GA:
- Initialize the population of chromosomes (individuals)¹;
- Evaluate each chromosome;
- Select the fittest chromosomes to generate new ones;
- Apply transformation operators (genetic operators) to generate new chromosomes;
- Delete the old ones, replacing them with the new chromosomes;
- Evaluate the new chromosomes;
- Stop criteria: if the time is up or a solution meets the requirements, stop the algorithm; otherwise, continue from step 3.
To represent the chromosome, an appropriate representation is used. The original proposal is to use binary values (0’s and 1’s) to represent these chromosomes.
To assess the quality of a chromosome, a fitness function is used, which is the function that solves our problem.
¹ Sometimes, I will use individuals or chromosomes interchangeably.
What Will We See in Part 3?
In the third part, we will start implementing some things in Python. And we’ll begin with the chromosomal representation that we will adopt throughout the series, which is binary.