The Data Science Lab
Neural Network Batch Training Using Python
The Batch Training Method
The trainBatch method of the program-defined NeuralNetwork class is quite complex. The method has local arrays and matrices to hold the accumulated gradients:
while epoch < maxEpochs:
ihWtsAccGrads = np.zeros(shape=[self.ni, self.nh],
dtype=np.float32) # accumulated input-to-hidden
hBiasesAccGrads = np.zeros(shape=[self.nh],
dtype=np.float32) # accumulated hidden biases
hoWtsAccGrads = np.zeros(shape=[self.nh, self.no],
dtype=np.float32) # accumulated hidden-to-output
oBiasesAccGrads = np.zeros(shape=[self.no],
dtype=np.float32) # accumulated output biases
When performing online training you should visit each training item in scrambled order so the algorithm doesn't get stuck in an oscillating mode. But when doing batch training, because the gradients are accumulated over all training items, you don't need to scramble the order in which items are visited:
for ii in range(numTrainItems): # visit each item
idx = indices[ii] # not scrambled
Each gradient is accumulated. For example, the input-to-hidden weight gradients are computed then accumulated like so:
for i in range(self.ni):
for j in range(self.nh):
ihGrads[i,j] = hSignals[j] * self.iNodes[i]
ihWtsAccGrads[i,j] += ihGrads[i,j]
The code is a bit more complicated than it first appears because a lot of work is done computing the so-called hidden node signals, which isn’t shown. After all training items have been visited, weights are updated. For example, the input-to-hidden weight are updated with this code:
for i in range(self.ni):
for j in range(self.nh):
delta = -1.0 * learnRate * ihWtsAccGrads[i,j]
self.ihWeights[i,j] += delta
Notice that batch training and online training require about the same amount of computation, but the number of weight updates is different. Loosely, for batch training, each weight is updated maxEpochs times (using a full gradient that requires numTrain calculations). For online training, each weight is updated maxEpochs * numTrain times (using an estimated gradient that requires one calculation).
Wrapping Up
In the early days of neural networks, there was quite a bit of controversy about which training technique, batch or online, was preferable. As computer processing power increased, the difference became less important. For simple neural networks, most of my colleagues try online training first, then, if results aren’t satisfactory, they'll try batch or mini-batch training. Like many machine learning activities, there are very few rules of thumb and a bit of experimentation is usually needed.
Once you understand the differences between batch and online training, you're in a good position to understand the strengths and weaknesses of the compromise mini-batch training technique. For complex neural networks, mini-batch training is usually the default approach used by my colleagues.
About the Author
Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].