Consider a two-layer network of the form shown in Figure 1 with M hidden units having Φ(.) = tanh(.) activation functions and full connectivity in both layers.
Figure1: Network diagram for the two-layer neural network corresponding to questions 1,2 and 3. The input, hidden, and output variables are represented by nodes, and the weight parameters are represented by links between the nodes, in which the bias parameters are denoted by links coming from additional input and hidden variables x0 and z0. Arrows denote the direction of information flow through the network during forward propagation