The number of neurons in the hidden layer can be set based on any of the following methods:
- The number of hidden neurons should be between the size of the input layer and the size of the output layer
- The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer
- The number of hidden neurons should be less than twice the size of the input layer
However, there is still not yet a standard right rule to set the number of neurons in the hidden layer.
Data augmentation is widely used for increasing training data. Suppose, we are training the network to perform an image classification task and we have only less number of images in our training set and we have no access to obtain more images to include in the training set. In that case, we can perform data augmentation by cropping, flipping, and padding the images and obtain the new images and include them in our training set.
Data normalization is usually performed as a preprocessing step. Data normalization implies we normalize the data points by subtracting the mean of each data point and dividing by its standard deviation. It helps in attaining better convergence during training the network.
It is not a good practice to initialize all the weights with zero. During backpropagation, we train the network by calculating the gradients of the loss function with respect to the weights. When we set all the weights to zero, then the derivatives will have the same value for all the weights. This makes the neurons to learn the same feature. Thus, when we set all the weights to zero, then we end up with a network where all the neurons learn the same feature.