Neural Network Lab
Neural Network How-To: Code an Evolutionary Optimization Solution
Evolutionary optimization can be used to train a neural network. A virtual chromosome holds the neural network's weights and bias values, and the error term is the average of all errors between the network's computed outputs and the training data target outputs. Learn how to code the solution.
Neural networks are software systems that can be used to make predictions. For example, predicting whether the price of some company's stock will go up, go down, or stay the same based on inputs such as bank interest rates, number of mentions on social media, and so on. A neural network is essentially a complex mathematical function.
Training a neural network is the process of finding a set of numeric weight values so that, for a given set of training data with known input and output values, the network's computed output values closely match the known output values. After these best weight values have been found, they can be placed into the neural network and used to predict the output of new input data that has unknown outputs.
By far the most common technique for training a neural network is called the back-propagation algorithm. Two major alternative techniques are particle swarm optimization (PSO) and evolutionary optimization (EO). This article presents a complete demo of neural network training using EO.
Take a look at the demo program in Figure 1. The demo uses EO to train a neural network that predicts the species of an Iris flower ("setosa," "versicolor," "virginica") based on the flower's sepal (green covering) length, sepal width, petal length and petal width. There are 24 training items. After training completed, the best set of weights found were placed into the neural network. The network correctly predicted the species of 5/6 = 0.8333 of six test data items.
[Click on image for larger view.]
Figure 1. Neural Network Training Demo
This article assumes you have a solid grasp of neural network concepts, including the feed-forward mechanism, activation functions, and weights and biases, and that you have advanced programming skills. The demo is coded using C# but you should be able to refactor the code to other languages such as JavaScript or Visual Basic .NET without too much difficulty. Most normal error checking has been omitted to keep the size of the code small and the main ideas as clear as possible.
Evolutionary Optimization
Evolutionary optimization is a type of genetic algorithm. The process is illustrated in Figure 2. The population matrix is a set of potential solutions, that is, a set of neural network weights. In the figure, there are six individuals, or chromosomes, in the population. Each chromosome has six genes that represent six neural network weights. The population here is artificially small -- a realistic neural network would normally have 50 or more weights and 20 or more individuals.
[Click on image for larger view.]
Figure 2. Evolutionary Optimization Algorithm
Step one is to select two good, but not necessarily best, parent individuals from the population. In step two, crossover is used to generate two child individuals, which hopefully combine good characteristics of the parents to give even better solutions. In step three, the two children are randomly mutated slightly to introduce new information into the system. In step four, the children are placed into the population, replacing two weak individuals. Steps one through four are collectively called reproduction and are core evolutionary optimization mechanisms. Step five, immigration, is optional. A random individual is generated and placed into the population, replacing a weak individual. Evolutionary optimization repeats steps one through five until some stopping condition, typically a maximum number of generations, is reached.
Overall Program Structure
The overall program structure of the demo, with most WriteLine statements removed and some minor edits to save space, is presented in Listing 1. To create the demo, I launched Visual Studio and created a new C# console application project named EvolutionaryTraining. The demo has no .NET version dependencies, so any version of Visual Studio should work. After the template code loaded, in the Solution Explorer window I renamed file Program.cs to the more descriptive EvolutionaryTrainingProgram.cs, and Visual Studio automatically renamed class Program for me.
Listing 1: Overall Program Structure
using System;
namespace EvolutionaryTraining
{
class EvolutionaryTrainingProgram
{
static void Main(string[] args)
{
double[][] trainData = new double[24][];
trainData[0] = new double[] { 6.3, 2.9, 5.6, 1.8, 1, 0, 0 };
trainData[1] = new double[] { 6.9, 3.1, 4.9, 1.5, 0, 1, 0 };
trainData[2] = new double[] { 4.6, 3.4, 1.4, 0.3, 0, 0, 1 };
trainData[3] = new double[] { 7.2, 3.6, 6.1, 2.5, 1, 0, 0 };
trainData[4] = new double[] { 4.7, 3.2, 1.3, 0.2, 0, 0, 1 };
trainData[5] = new double[] { 4.9, 3, 1.4, 0.2, 0, 0, 1 };
trainData[6] = new double[] { 7.6, 3, 6.6, 2.1, 1, 0, 0 };
trainData[7] = new double[] { 4.9, 2.4, 3.3, 1, 0, 1, 0 };
trainData[8] = new double[] { 5.4, 3.9, 1.7, 0.4, 0, 0, 1 };
trainData[9] = new double[] { 4.9, 3.1, 1.5, 0.1, 0, 0, 1 };
trainData[10] = new double[] { 5, 3.6, 1.4, 0.2, 0, 0, 1 };
trainData[11] = new double[] { 6.4, 3.2, 4.5, 1.5, 0, 1, 0 };
trainData[12] = new double[] { 4.4, 2.9, 1.4, 0.2, 0, 0, 1 };
trainData[13] = new double[] { 5.8, 2.7, 5.1, 1.9, 1, 0, 0 };
trainData[14] = new double[] { 6.3, 3.3, 6, 2.5, 1, 0, 0 };
trainData[15] = new double[] { 5.2, 2.7, 3.9, 1.4, 0, 1, 0 };
trainData[16] = new double[] { 7, 3.2, 4.7, 1.4, 0, 1, 0 };
trainData[17] = new double[] { 6.5, 2.8, 4.6, 1.5, 0, 1, 0 };
trainData[18] = new double[] { 4.9, 2.5, 4.5, 1.7, 1, 0, 0 };
trainData[19] = new double[] { 5.7, 2.8, 4.5, 1.3, 0, 1, 0 };
trainData[20] = new double[] { 5, 3.4, 1.5, 0.2, 0, 0, 1 };
trainData[21] = new double[] { 6.5, 3, 5.8, 2.2, 1, 0, 0 };
trainData[22] = new double[] { 5.5, 2.3, 4, 1.3, 0, 1, 0 };
trainData[23] = new double[] { 6.7, 2.5, 5.8, 1.8, 1, 0, 0 };
double[][] testData = new double[6][];
testData[0] = new double[] { 4.6, 3.1, 1.5, 0.2, 0, 0, 1 };
testData[1] = new double[] { 7.1, 3, 5.9, 2.1, 1, 0, 0 };
testData[2] = new double[] { 5.1, 3.5, 1.4, 0.2, 0, 0, 1 };
testData[3] = new double[] { 6.3, 3.3, 4.7, 1.6, 0, 1, 0 };
testData[4] = new double[] { 6.6, 2.9, 4.6, 1.3, 0, 1, 0 };
testData[5] = new double[] { 7.3, 2.9, 6.3, 1.8, 1, 0, 0 };
ShowMatrix(trainData, trainData.Length, 1, true);
ShowMatrix(testData, testData.Length, 1, true);
const int numInput = 4;
const int numHidden = 6;
const int numOutput = 3;
NeuralNetwork nn =
new NeuralNetwork(numInput, numHidden, numOutput);
// Training parameters specific to EO
int popSize = 8;
int maxGeneration = 500;
double exitError = 0.0;
double mutateRate = 0.20;
double mutateChange = 0.01;
double tau = 0.40;
double[] bestWeights = nn.Train(trainData, popSize, maxGeneration,
exitError, mutateRate, mutateChange, tau);
ShowVector(bestWeights, 10, 3, true);
nn.SetWeights(bestWeights);
double trainAcc = nn.Accuracy(trainData);
Console.Write("\nAccuracy on training data = ");
Console.WriteLine(trainAcc.ToString("F4"));
double testAcc = nn.Accuracy(testData);
Console.Write("\nAccuracy on test data = ");
Console.WriteLine(testAcc.ToString("F4"));
Console.WriteLine("\nEnd NN training demo");
Console.ReadLine();
} // Main
static void ShowVector(double[] vector, int valsPerRow,
int decimals, bool newLine)
{
for (int i = 0; i < vector.Length; ++i)
{
if (i % valsPerRow == 0) Console.WriteLine("");
if (vector[i] >= 0.0) Console.Write(" ");
Console.Write(vector[i].ToString("F" + decimals) + " ");
}
if (newLine == true)
Console.WriteLine("");
}
static void ShowMatrix(double[][] matrix, int numRows,
int decimals, bool newLine)
{
for (int i = 0; i < numRows; ++i)
{
Console.Write(i.ToString().PadLeft(3) + ": ");
for (int j = 0; j < matrix[i].Length; ++j)
{
if (matrix[i][j] >= 0.0)
Console.Write(" ");
else
Console.Write("-"); ;
Console.Write(Math.Abs(matrix[i][j]).ToString("F" +
decimals) + " ");
}
Console.WriteLine("");
}
if (newLine == true)
Console.WriteLine("");
}
} // Program
public class NeuralNetwork
{
// Defined here
}
public class Individual : IComparable<Individual>
{
// Defined here
}
} // ns
At the top of the source code I deleted all namespace references except the reference to System. If you scan Listing 1 you'll see the program class has a single Main method and two helpers, ShowVector and ShowMatrix. There are two program-defined classes: Class NeuralNetwork encapsulates most of the code logic, in particular the Train method, and is fairly complex, while Class Individual is a simple container class, which defines a single possible set of weights and bias values for the neural network.
The demo begins by setting up the 24 training items and six test items:
double[][] trainData = new double[24][];
trainData[0] = new double[] { 6.3, 2.9, 5.6, 1.8, 1, 0, 0 };
... etc.
These are a subset of the well-known 150-item Fisher's data set. In most cases, you would read training and test data from a text file or SQL database. After setting up the data, the demo instantiates a 4-6-3 fully connected feed-forward neural network because there are four inputs and three possible output classes. The choice of six for the number of hidden nodes is arbitrary.
Next, the demo assigns values for the EO training parameters:
int popSize = 8;
int maxGeneration = 500;
double exitError = 0.0;
double mutateRate = 0.20;
double mutateChange = 0.01;
double tau = 0.40;
Each type of neural network training algorithm has a different set of parameters. Back-propagation uses a learning rate, momentum and so on. PSO uses number of particles, cognitive and social weights, and so on. Here, the EO population size is the number of individuals. More individuals generally leads to a better solution, at the expense of performance. The maximum generation variable is the maximum number of iterations the EO select-crossover-mutate process will execute.
Variable exitError sets a short-circuit exit threshold if the best set of weights found produces an error less the threshold. Here, exitError is set to zero so an early exit will not occur. Variable mutateRate controls how many genes in a newly-generated child's chromosome will be mutated. Variable mutateChange controls the magnitude of the change of mutated genes. Variable tau is the "selection pressure" and controls the likelihood that the two best individuals in the population will be selected as parents for reproduction. Larger values of tau increase the chances that the two best individuals will be chosen.
Choosing the best values for the EO parameters is more art than science, and typically boils down to trial and error. This factor is generally considered the primary weakness of EO and genetic algorithms.
The demo trains and evaluates the neural network with these key lines of code:
double[] bestWeights = nn.Train(trainData, popSize, maxGeneration,
exitError, mutateRate, mutateChange, tau);
nn.SetWeights(bestWeights);
double trainAcc = nn.Accuracy(trainData);
double testAcc = nn.Accuracy(testData);
The Individual Class
The definition of the Individual class is given in Listing 2. The class inherits from the IComparable interface so that a collection of Individual objects can be sorted automatically from smallest error to largest. The class contains a Random object for use in the class constructor to generate a chromosome with random gene values, each of which is between minGene and maxGene.
Notice the error value isn't passed in as a parameter to the constructor. This is a bit subtle. The error for an Individual object depends on the object's chromosome, which represents a neural network's weights and bias values, and the training data, which in most cases isn't part of the neural network. Therefore, an Individual object is first instantiated with a random chromosome by a call to the constructor, then the error is computed using the training data, and then the error value is assigned to the object.
Listing 2: Individual Class
public class Individual : IComparable<Individual>
{
public double[] chromosome; // Potential solution
public double error; // Smaller better
private int numGenes; // (numWeights)
private double minGene; // Smallest value for a chromosome
private double maxGene;
private double mutateRate;
private double mutateChange;
static Random rnd = new Random(0);
public Individual(int numGenes, double minGene, double maxGene,
double mutateRate, double mutateChange)
{
this.numGenes = numGenes;
this.minGene = minGene;
this.maxGene = maxGene;
this.mutateRate = mutateRate;
this.mutateChange = mutateChange;
this.chromosome = new double[numGenes];
for (int i = 0; i < this.chromosome.Length; ++i)
this.chromosome[i] = (maxGene - minGene) *
rnd.NextDouble() + minGene;
// this.error supplied after calling ctor!
}
public int CompareTo(Individual other)
{
if (this.error < other.error) return -1;
else if (this.error > other.error) return 1;
else return 0;
}
}