Neural Network Lab

Multi-Swarm Optimization for Neural Networks Using C#

Multi-swarm optimization (MSO) is a powerful variation of particle swarm optimization. Understanding how MSO works and how to implement it can be a valuable addition to your developer toolkit.

Many machine learning (ML) systems require code that minimizes error. In the case of neural networks, training a network is the process of finding a set of values for the network's weights and biases so that the error between the known output values in some training data and the computed output values is minimized. Algorithms that minimize error are also called optimization algorithms.

There are roughly a dozen different optimization algorithms commonly used in machine learning. For example, back-propagation is often used to train simple neural networks, and something called L-BFGS is often used to train logistic regression classifiers. Multi-swarm optimization (MSO) is one of my go-to algorithms for ML training, in particular with deep neural networks.

Take a look at the screenshot of a demo program in Figure 1. The demo uses MSO to solve a well-known benchmark problem called Rastrigin's function. Rastrigin's function can accept any number of input values; the number of inputs is called the dimension of the function. The demo uses Rastrigin's function with six dimensions. The minimum value of Rastrigin's function is 0.0 when all input values equal 0.

[Click on image for larger view.] Figure 1. Multi-Swarm Optimization in Action

The graph in Figure 2 shows Rastrigin's function with two dimensions. The function has many hills and valleys that can trap optimization algorithms with an incorrect answer. The one global minimum value is at the bottom of the graph and isn't quite visible because of all the surrounding curves. Solving Rastrigin's function is quite difficult for optimization algorithms, which is why researchers often use it when investigating a new optimization algorithm.


[Click on image for larger view.] Figure 2. Rastrigin's Function for Two Variables

MSO is a variation of particle swarm optimization (PSO). In PSO, there are multiple virtual particles, where each particle represents a potential solution to the problem being solved. The collection of particles is called a swarm. The particles move, that is, change values, in a way that models the behavior of groups such as schools of fish and flocks of birds. MSO extends PSO by using several swarms that interact with each other.

The demo program uses four swarms. Each swarm has five particles. Larger numbers of swarms and particles increase the chance of finding the true minimum value, at the expense of slower performance. MSO is an iterative process. The demo sets a maximum number of iterations of the algorithm, in variable maxEpochs, to 2000. Again, more iterations is offset by slower performance.

Figure 1 shows the progress of the MSO algorithm every 100 epochs (iterations). In this case, MSO succeeded in finding the true minimum value of 0.0 at (0, 0, 0, 0, 0, 0). In realistic ML training scenarios, you won't know the true minimum error and usually have to employ some trial and error with the MSO parameters.

In this article, I'll explain how MSO works and walk you through the demo program. The code is a bit too long to present in its entirety, but the complete demo source code is available in the download that accompanies this article. I code the demo using C#, but you shouldn't have too much trouble refactoring the code to another language such as Visual Basic .NET or JavaScript.

Demo Program Structure
The overall structure of the demo program, with some minor edits to save space, is presented in Listing 1. All the control logic is contained in the Main method. Method Solve implements the MSO algorithm. Method Error accepts a particle's position (an array of type double), computes the value of Rastrigin's function at the position, and returns the absolute value of the difference between the computed value and the known minimum value of 0.0.

Class Particle encapsulates a particle's current position (a potential solution), current velocity (values used to update the particle's position), the error at the current position, the position of the best (smallest) error ever found by the particle, and the associated error at that best position.

Class Swarm is essentially an array of Particle objects, along with the best position found by any particle in the swarm, and the associated error of that best swarm position.

Listing 1: Overall Program Structure
using System;
namespace MultiSwarm
{
  class MultiSwarmProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Begin multi-swarm optimization demo");

      int dim = 6;
      Console.WriteLine("Goal is to solve Rastrigin's function of " +
        dim + " variables");
      Console.Write("Function has known min value of 0.0 at (");
      for (int i = 0; i < dim -1; ++i)
        Console.Write("0, ");
      Console.WriteLine("0)");

      int numSwarms = 4;
      int numParticles = 5;
      int maxEpochs = 2000;
      double minX = -10.0;
      double maxX = 10.0;

      Console.WriteLine("Start MSO solve");
      double[] bestPosition = Solve(dim, numSwarms,
        numParticles, maxEpochs, minX, maxX);
      double bestError = Error(bestPosition);
      Console.WriteLine("Done");

      Console.WriteLine("Best solution found: ");
      ShowVector(bestPosition, 4, true);
      Console.WriteLine("Error of best position = " +
        bestError.ToString("F8"));

      Console.WriteLine("End multi-swarm optimization demo");
      Console.ReadLine();
    } // Main

    static void ShowVector(double[] vector,
      int decimals, bool newLine) { . . }
    static double[] Solve(int dim, int numSwarms, int numParticles,
      int maxEpochs, double minX, double maxX) { . . }
    public static double Error(double[] position) { . . }
  } 

  public class Particle
  {
    static Random ran = new Random(0);
    public double[] position;
    public double[] velocity;
    public double error;
    public double[] bestPartPosition;
    public double bestPartError;

    public Particle(int dim, double minX, double maxX) // random position
    {
      position = new double[dim];
      velocity = new double[dim];
      bestPartPosition = new double[dim];
      for (int i = 0; i < dim; ++i)
      {
        position[i] = (maxX - minX) * ran.NextDouble() + minX;
        velocity[i] = (maxX - minX) * ran.NextDouble() + minX;
      }
      error = MultiSwarmProgram.Error(position);
      bestPartError = error;
      Array.Copy(position, bestPartPosition, dim);
    }
  } // Particle

  public class Swarm
  {
    public Particle[] particles;
    public double[] bestSwarmPosition;
    public double bestSwarmError;

    public Swarm(int numParticles, int dim, double minX, double maxX)
    {
      bestSwarmError = double.MaxValue;
      bestSwarmPosition = new double[dim];
      particles = new Particle[numParticles];
      for (int i = 0; i < particles.Length; ++i)
      {
        particles[i] = new Particle(dim, minX, maxX); // random position
        if (particles[i].error < bestSwarmError)
        {
          bestSwarmError = particles[i].error;
          Array.Copy(particles[i].position, bestSwarmPosition, dim);
        }
      }
    }
  } // Swarm  
} // ns

To create the demo program, I launched Visual Studio and created a C# console application named MultiSwarm. The demo has no significant .NET dependencies so any version of Visual Studio should work. After the template-generated code loaded into the Visual Studio editor, in the Solution Explorer window I renamed file Program.cs to the more descriptive MultiSwarmProgram.cs and Visual Studio automatically renamed class Program for me. At the top of the source code, I removed all unnecessary references to .NET namespaces, leaving just the reference to the top-level System namespace.

The key statements in the Main method are:

int dim = 6;
int numSwarms = 4;
int numParticles = 5;
int maxEpochs = 2000;
double minX = -10.0;
double maxX = 10.0;
double[] bestPosition = Solve(dim, numSwarms,
  numParticles, maxEpochs, minX, maxX);
double bestError = Error(bestPosition);

The purposes of variables dim, numSwarms, numParticles, and maxEpochs should be clear to you based on my earlier explanation of MSO. Variables minX and maxX establish upper and lower limits on the values in each component of a particle's position vector. Put another way, for Rastrigin's function of six variables, a particle's position will be an array with six cells, each of which holds a value between -10.0 and +10.0. These values are typical for ML systems where input data has been normalized in some way.

The Solve method returns the best position (location of the smallest error) found by any particle, in any swarm, during the MSO search process. That best position is passed to the Error method to get the associated error. An alternative design is to implement method Solve so that it returns the associated best error as an out-parameter.

The MSO Algorithm
In high-level pseudo-code, the MSO algorithm is presented in Listing 2. The key to the algorithm is how a particle's new velocity is calculated. In math terms the new velocity is:

v(t+1) = w * v(t) +
         (c1 * r1) * (p (t) - x(t)) + 
         (c2 * r2) * (s (t) - x (t)) +
         (c3 * r3) * (g (t) - x (t))

After a new velocity has been computed, a particle's new position is:

x (t+1) = x (t) + v (t+1)

The term v(t+1) means the velocity at time t+1, in other words, the new velocity. Term v(t) is the current velocity. Term x (t) is the current position. Notice that x and v are in bold, which indicates they're vectors such as (8.0, -6.0) rather than single values.

Term p (t) is the position of the particle's best-known position. Term s (t) is the best position of any particle in the particle's swarm. Term g (t) is the global best position of any particle in any swarm.

Term w is a constant called the inertia factor. Terms c1, c2 and c3 are constants that establish a maximum change for each component of the new velocity. Terms r1, r2 and r3 and random values between 0 and 1 that provide a randomization effect to each velocity update.

The new velocity computation is probably best explained by example. Suppose a problem has dim set to 2 and a particle is currently at (20.0, 30.0) and its current velocity is (-1.0, -3.0). Also, the best-known position of the particle is (10.0, 12.0), the best-known position of any particle in the swarm is (8.0, 9.0), and the best-known position of any particle in any swarm is (5.0, 6.0). And suppose that constant w has value 0.7, constants c1 and c2 are both 1.4, and constant c3 is 0.4. Finally, suppose random values r1, r2 and r3 are all 0.2.

The new velocity of the particle (with rounding to one decimal) is:

v (t+1) = 0.7 * (-1.0, -3.0) +
         (1.4 * 0.2) * ((10.0, 12.0) - (20.0, 30.0)) +
         (1.4 * 0.2) * ((8.0, 9.0) - (20.0, 30.0)) +
         (0.4 * 0.2) * ((5.0, 6.0) - (20.0, 30.0))

       = 0.7 * (-1.0, -3.0) +
         0.3 * (-10.0, -18.0) +
         0.3 * (-12.0, -21.0) +
         0.1 * (-15.0, -24.0)

       = (-0.7, -2.1) +
         (-3.0, -5.4) +
         (-3.6, -6.3) +
         (-1.5, -2.4)

       = (-8.8, -16.2)

And so the particle's new position is:

x (t+1) = (20.0, 30.0) + (-8.8, -16.2)
       = (11.2, 13.8)

Assuming the optimal solution is at (0.0, 0.0), as is the case with Rastrigin's function, notice that the particle has moved from its original position (20.0, 30.0) to a new position (11.2, 13.8) that's much closer to the optimal solution.

The inertia term, w, in v (t+1) encourages a particle to continue moving in its current direction. The p (t) term encourages a particle to move toward its historical best-known position. The s (t) term encourages a particle to move toward the best-known position found by any of the particle's swarm-mates. The g (t) term encourages a particle to move toward the best-known position found by any particle in any swarm.

Constants c1, c2, and c3 are sometimes called the cognitive, social, and global weights. Those constants, along with random values r1, r2 and r3, and the inertia weight, determine how much each term influences the motion of a particle. Some research in regular PSO suggests reasonable values for w, c1 and c2 are 0.729, 1.49445 and 1.49445, respectively. There's little research on the c3 constant in MSO, but I typically use 0.3645 (half of the inertia weight), which has worked well for me in practice.

Listing 2: Multi-Swarm Algorithm Pseudo-Code
for-each swarm loop
  create particles at random positions
end for

while epoch < maxEpochs loop
  for-each swarm loop
    for-each particle in swarm loop

      does particle die?
      does particle immigrate?

      compute new velocity based on 
        curr velocity, best particle position,
        best swarm position, and
        best overall position
      use new velocity to update position
      check if new position is a new particle
        best, or a new swarm best, or a new 
        global best
    end each swarm
  end each particle
end while
return best global position found

In the pseudo-code in Listing 2, notice that before a particle moves, it may die (and be replaced by a new particle at a random position), or it may immigrate (swap swarms with a randomly selected particle). The death and immigration mechanisms add an element of randomness to MSO and help prevent the algorithm from getting stuck at a non-optimal solution.


comments powered by Disqus

Featured

Subscribe on YouTube