Genetic Algorithm Intro

technical

ChatGPT

A walkthrough of genetic algorithms generated using ChatGPT

Author

Kevin Bird

Published

December 21, 2022

import random
from typing import List, Callable
from copy import deepcopy

Introduction

Have you ever faced a problem that seemed impossible to solve? Perhaps you were trying to optimize a manufacturing process, find the shortest path between two points, or design a new product. These types of problems can be difficult to solve using traditional algorithms, especially when the solution space is large or the problem is unstructured. This is where genetic algorithms come in.

A genetic algorithm is a heuristic optimization technique that is inspired by the process of natural evolution. It is a metaheuristic, which means that it is a high-level algorithm that is used to guide other algorithms.

Genetic algorithms are used to solve a wide range of problems, from simple optimization problems to complex machine learning tasks. They are particularly useful for solving problems that have a large solution space, such as finding the shortest path in a graph or optimizing the parameters of a machine learning model.

So, how do genetic algorithms work? In a nutshell, they generate a population of candidate solutions, evaluate the fitness of each individual, select the fittest individuals for reproduction, generate offspring through crossover and mutation, and replace the weakest individuals. This process is repeated until a satisfactory solution is found or a predetermined number of iterations has been reached.

In this article, we will look at the key components of a genetic algorithm and see how they work together to solve complex problems. We will also explore some of the key considerations when designing a genetic algorithm, such as representation, fitness function, selection, crossover, and mutation.

Initialize the population

A population of candidate solutions (also known as individuals or chromosomes) is generated randomly. Each individual represents a potential solution to the problem. For example, if the problem is to find the maximum value of a mathematical function, each individual in the population could be a set of input values for the function.

The size of the population and the representation of the solutions depend on the problem being solved. For example, if the problem is a simple mathematical optimization problem, each individual could be a single floating-point number. If the problem is more complex, such as finding the shortest path in a graph, each individual could be a list of integers or a string of characters.

initialize_population in code

initialize_population takes two integer parameters: solution_size, which represents the size of a single solution (i.e., the number of genes), and population_size, which represents the size of the population (i.e., the number of solutions). It returns a list of lists containing random integers, representing the initial population.

def initialize_population(solution_size: int, population_size: int) -> List[List[int]]:
    """
    Generates a random initial population.

    Parameters:
    - solution_size: the size of a single solution (number of genes)
    - population_size: the size of the population (number of solutions)

    Returns:
    - A list of lists containing random integers, representing the initial population.
    """
    return [[random.randint(0, 1) for _ in range(solution_size)] for _ in range(population_size)]

def test_initialize_population(solution_size, population_size):
    population = initialize_population(solution_size, population_size)

    # Check that the returned value is a list of lists
    assert isinstance(population, list)
    assert all(isinstance(x, list) for x in population)

    # Check that the list contains 5 lists
    assert len(population) == population_size

    # Check that each list contains 10 integers
    assert all(len(x) == solution_size for x in population)

    # Check that each integer is either 0 or 1
    assert all(all(x == 0 or x == 1 for x in solution) for solution in population)

test_initialize_population(5,10)
test_initialize_population(10,5)
test_initialize_population(0,5)
test_initialize_population(5,0)

solution_size = 5
population_size = 10

population = initialize_population(solution_size, population_size)

Evaluate the fitness of each individual

The fitness of each individual is evaluated using a fitness function. This function takes an individual as input and returns a score that reflects the quality of the solution represented by that individual.

The fitness function should be designed to reflect the goals of the problem. For example, if the problem is to maximize a mathematical function, the fitness function could return the value of the function for a given set of input values. If the problem is to find the shortest path in a graph, the fitness function could return the length of the path.

fitness in code

fitness takes a single parameter, solution, which is a list of integers representing a solution. It returns an integer representing the fitness of the solution. In this particular implementation, the fitness is calculated by counting the number of 1s in the solution.

def fitness(solution: List[int]) -> int:
    """
    Calculates the fitness of a solution.
    
    Parameters:
    - solution: a list of integers representing a solution
    
    Returns:
    - An integer representing the fitness of the solution.
    """
    # Calculate the fitness of the solution
    # For example, you could count the number of 1s in the solution
    return solution.count(1)

# Test with a solution containing all 0s
solution = [0, 0, 0, 0, 0]
assert fitness(solution) == 0

# Test with a solution containing all 1s
solution = [1, 1, 1, 1, 1]
assert fitness(solution) == 5

# Test with a solution containing a mix of 0s and 1s
solution = [0, 1, 1, 0, 0]
assert fitness(solution) == 2

# Test with a solution containing no 1s
solution = [0, 0, 0, 'blah', 0]
assert fitness(solution) == 0

Select the fittest individuals

The fittest individuals in the population are selected for reproduction based on their fitness scores. There are many ways to select the fittest individuals, such as using a ranking selection, tournament selection, or roulette wheel selection.

In ranking selection, the individuals are ranked based on their fitness and a probability of selection is assigned to each individual based on its rank. The fittest individuals have a higher probability of being selected.

In tournament selection, a group of individuals is selected at random and the fittest individual from the group is chosen for reproduction.

In roulette wheel selection, the probability of selection is proportional to the fitness of the individual. The higher the fitness, the higher the probability of selection.

selection in code

selection takes two parameters: population, which is a list of lists representing the population, and fitness_fn, which is a function that calculates the fitness of a solution. It returns a list of the fittest individuals in the population.

It does this by sorting the population by fitness using the fitness_fn function as the key, and then selecting the top half of the population (i.e., the individuals with the highest fitness).

def selection(population: List[List[int]], fitness_fn: Callable[[List[int]], int]) -> List[List[int]]:
    """
    Selects the fittest individuals from the population.
    
    Parameters:
    - population: a list of lists representing the population
    - fitness_fn: a function that calculates the fitness of a solution
    
    Returns:
    - A list of the fittest individuals in the population.
    """
    # Sort the population by fitness
    sorted_population = sorted(population, key=fitness_fn, reverse=True)
    # Select the top individuals
    return sorted_population[:int(len(sorted_population) / 2)]

# Test with a population of size 10
population = [[1, 1, 1, 1, 0], 
              [1, 1, 1, 0, 0], 
              [0, 1, 0, 0, 0], 
              [0, 1, 0, 0, 1], 
              [0, 0, 1, 0, 0], 
              [0, 0, 0, 0, 1], 
              [0, 0, 0, 0, 1],
              [1, 1, 0, 1, 0], 
              [1, 0, 1, 0, 1], 
              [0, 1, 1, 1, 0]
             ]

fittest = selection(population, fitness)

# Check that the returned value is a list of lists
assert isinstance(fittest, list)
assert all(isinstance(x, list) for x in fittest)

# Check that the list contains 5 lists
assert len(fittest) == 5

# Check that the first list has 3 1s
assert fitness(fittest[0]) == 4

# Check that the last list has 2 1s
assert fitness(fittest[-1]) == 3

fittest

[[1, 1, 1, 1, 0],
 [1, 1, 1, 0, 0],
 [1, 1, 0, 1, 0],
 [1, 0, 1, 0, 1],
 [0, 1, 1, 1, 0]]

Generate offspring

Offspring are generated through crossover and mutation. Crossover involves combining the genetic material of two parents to produce offspring. There are many ways to perform crossover, such as single-point crossover, two-point crossover, and uniform crossover.

In single-point crossover, a random crossover point is chosen and the offspring inherit the genetic material from one parent before the crossover point and from the other parent after the crossover point

In two-point crossover, two crossover points are chosen and the offspring inherit the genetic material from one parent in the first segment, from the other parent in the second segment, and so on.

In uniform crossover, each gene in the offspring has an equal probability of inheriting the corresponding gene from either parent.

Mutation involves randomly flipping the value of one or more genes in the offspring. The mutation rate determines the probability that a gene will be mutated. A high mutation rate can introduce more diversity into the population, but it can also decrease the quality of the solutions.

crossover in code

crossover takes two parameters: parent1, which is a list of integers representing the first parent, and parent2, which is a list of integers representing the second parent. It returns a list of integers representing the offspring, generated through crossover.

It does this by choosing a random crossover point and combining the two parents by taking the first part of parent1 and the second part of parent2, or vice versa.

def crossover(parent1: List[int], parent2: List[int]) -> List[int]:
    """
    Generates offspring through crossover.
    
    Parameters:
    - parent1: a list of integers representing the first parent
    - parent2: a list of integers representing the second parent
    
    Returns:
    - A list of integers representing the offspring.
    """
    # Choose a crossover point
    crossover_point = random.randint(1, len(parent1) - 1)
    # Generate offspring
    offspring = parent1[:crossover_point] + parent2[crossover_point:]
    return offspring

offspring = crossover([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

# Check that the returned value is a list
assert isinstance(offspring, list)

# Check that the list has length 5
assert len(offspring) == 10
# Check that the offspring contains elements from both parents
assert 0 in offspring and 1 in offspring

assert offspring[0] == 0
assert offspring[-1] == 1

offspring

[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

mutate in code

mutate takes two parameters: populations, which is a list of lists of integers representing the population, and mutation_rate, which is a float representing the probability of a mutation occurring. It returns the modified population.

It does this by iterating over each gene in each individual in the population and randomly flipping its value with a probability of mutation_rate.

def mutate(populations: List[List[int]], mutation_rate: float) -> List[List[int]]:
    """
    Introduces mutations into the population.
    
    Parameters:
    - populations: a list of lists of integers representing the population
    - mutation_rate: a float representing the probability of a mutation occurring
    
    Returns:
    - The modified population.
    """
    populations = deepcopy(populations)
    for population in populations:
        for i in range(len(population)):
            if random.random() < mutation_rate:
                # Flip the value of a random gene
                population[i] = 1 - population[i]
    return populations

# Test with population = [[0, 0, 0, 0, 0], [1, 1, 1, 1, 1]] and mutation_rate = 0.5
populations = [[0, 0, 0, 0, 0], [1, 1, 1, 1, 1]]
mutation_rate = 0.5
result = mutate(populations, mutation_rate)

# Check that the returned value is a list of lists
assert isinstance(result, list)
assert all(isinstance(i, list) for i in result)

# Check that the length of each inner list is 5
assert all(len(i) == 5 for i in result)

# Check that the result contains at least one mutation
assert any(i != j for i, j in zip(result[0], populations[0])) or any(i != j for i, j in zip(result[1], populations[1]))

# Test with population = [[0, 0, 0, 0, 0], [1, 1, 1, 1, 1]] and mutation_rate = 0.0
populations = [[0, 0, 0, 0, 0], [1, 1, 1, 1, 1]]
mutation_rate = 0
result = mutate(populations, mutation_rate)

# Check that the returned value is a list of lists
assert isinstance(result, list)
assert all(isinstance(i, list) for i in result)

# Check that the length of each inner list is 5
assert all(len(i) == 5 for i in result)

# Check that the result is unchanged
assert result == populations

Replace the weakest individuals

The weakest individuals in the population are replaced with the offspring. This step is known as survival of the fittest. The size of the population remains constant and the fittest individuals are preserved while the weakest are replaced.

Repeat the process

The process is repeated until a satisfactory solution is found or a predetermined number of iterations has been reached. The genetic algorithm continues to evolve the population until a satisfactory solution is found or the maximum number of iterations is reached.

genetic_algorithm in code

# Run the genetic algorithm
def genetic_algorithm(solution_size, population_size, fitness_fn, mutation_rate):
    # Initialize the population
    population = initialize_population(solution_size, population_size)
    print(selection(population, fitness_fn)[0])
    # Iterate until a solution is found
    while True:
        # Select the fittest individuals
        fittest = selection(population, fitness_fn)
        # Generate offspring through crossover
        offspring = crossover(fittest[0], fittest[1])
        # Introduce mutations
        population = mutate(population, mutation_rate)
        # Add the offspring to the population
        population.append(offspring)
        # Check if a solution has been found
        if fitness_fn(offspring) == len(offspring):
            return offspring

# Test the genetic algorithm
result = genetic_algorithm(100, 100, fitness, 0.001)
print(result)

[0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Choosing an appropriate representation in a genetic algorithm

Representation is an important aspect of genetic algorithms because it determines how the solutions are encoded and how they can be manipulated. In order to design a genetic algorithm, it is necessary to choose an appropriate representation that is well-suited to the problem being solved.

There are many different ways to represent solutions in a genetic algorithm. For example, solutions can be represented as binary strings, floating-point numbers, permutations, or even images. The choice of representation depends on the problem being solved and the available resources.

One common representation in genetic algorithms is the binary string, which is a sequence of 0s and 1s. This representation is well-suited to problems that have a discrete set of solutions, such as finding the shortest path in a graph.

Another common representation is the floating-point number, which is a real number represented in binary format. This representation is well-suited to continuous optimization problems, such as finding the maximum value of a mathematical function.

Permutations are also a common representation in genetic algorithms. They are used to solve problems that involve rearranging a set of items, such as scheduling tasks or arranging objects in a warehouse.

Finally, images can also be used as a representation in genetic algorithms. This is particularly useful for tasks such as image generation or image recognition.

When choosing a representation, it is important to consider the size of the solution space, the complexity of the problem, and the available resources. The representation should be able to capture the key features of the problem and allow for efficient manipulation of the solutions.

Conclusion

In this article, we have explored a simple genetic algorithm that demonstrates the key components of this optimization technique. We have seen how the genetic algorithm generates a population of candidate solutions, evaluates the fitness of each individual, selects the fittest individuals for reproduction, generates offspring through crossover and mutation, and replaces the weakest individuals. This process is repeated until a satisfactory solution is found or a predetermined number of iterations has been reached.

We have also looked at some of the considerations when designing a genetic algorithm, such as the representation of the solutions, the fitness function, the selection function, the crossover function, and the mutation function. These components are important for guiding the search for solutions and improving the quality of the solutions found by the genetic algorithm.

Overall, genetic algorithms are a powerful optimization technique that can be used to solve a wide range of problems, particularly those with large solution spaces or unstructured problems. By understanding the key components and how they work together, you can design and implement a genetic algorithm that can find high-quality solutions to complex problems.