The Data Science Lab

Neural Network Cross Entropy Using Python

The key method is NeuralNetwork.train and it implements back-propagation training with CE error. I created a main function to hold all program control logic. I started by creating a 4-5-3 neural network, like so:

def main():
  numInput = 4
  numHidden = 5
  numOutput = 3
  nn = NeuralNetwork(numInput, numHidden, numOutput)
...

The class uses a hardcoded tanh hidden layer activation and the constructor sets a class-scope random number generator seed so results will be reproducible. All weights and biases are initialized to small random values between -0.01 and +0.01. Next, I loaded training and test data into memory with these statements:

trainDataPath = "irisTrainData.txt"
trainDataMatrix = loadFile(trainDataPath)
testDataPath = "irisTestData.txt"
testDataMatrix = loadFile(testDataPath)

The back-propagation training is prepared and invoked:

maxEpochs = 80
learnRate = 0.01
nn.train(trainDataMatrix, maxEpochs, learnRate)

Method train uses the back-propagation algorithm and displays a progress message with the current CE error, every 10 iterations. It's usually important to monitor progress during neural network training because it's not uncommon for training to stall out completely, and if that happens you don't want to wait for an entire training run to complete. The demo program concludes with these statements:

...
  accTrain = nn.accuracy(trainDataMatrix)
  accTest = nn.accuracy(testDataMatrix)
  
  print("\nAccuracy on 120-item train data = \
    %0.4f " % accTrain)
  print("Accuracy on 30-item test data   = \ 
    %0.4f " % accTest)
  
  print("\nEnd demo \n")
   
if __name__ == "__main__":
  main()

# end script

Notice that during training you’re primarily interested in error, but after training you’re primarily interested in classification accuracy.

Wrapping Up
To recap, when performing neural network classifier training, you can use squared error or cross entropy error. Cross entropy is a measure of error between a set of predicted probabilities (or computed neural network output nodes) and a set of actual probabilities (or a 1-of-N encoded training label). Cross entropy error is also known as log loss. Squared error is a more general form of error and is just the sum of the squared differences between a predicted set of values and an actual set of values. Often, when using back-propagation training, cross entropy tends to give better training results more quickly than squared error, but squared error is less volatile than cross entropy.


In my opinion, research results about which error metric gives better results are inconclusive. In the early days of neural networks, squared error was the most common error metric, but currently cross entropy is used more often. There are other error metrics that can be used for neural network training, but there’s no solid research on this topic of which I'm aware.


About the Author

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at jamccaff@microsoft.com.

comments powered by Disqus

Featured

Subscribe on YouTube