In this question, we study the properties of SELU.
Construct a 100-layer MLP, with 100 input neurons, and the number of neurons in the hidden layer and output layer being uniformly drawn from $50$ to $150$. Initialize the weights of the MLP such that weights in a fully connected layer with $n_{in}$ inputs are randomly drawn from $N(0, \frac{1}{\sqrt{n_{in}}})$.
We perform several experiments with the MLP below. For each experiment, describe what you observe.
(a) Use SELU activation for all neurons. Sample 1000 input vectors in $\mathbf{R}^{100}$, with each entry being drawn from $N(0, 1)$. For each layer, compute the mean and standard deviation of the activation values for these input vectors, and plot the means and standard deviation against the layer number. Plot the distribution of the activation values for the input layer, 1st, 10th, 50th, and 90th layer.
Answer. [Write your solution here. Add cells as needed.]
(b) Repeat (i), but draw each input value from $N(0, 10)$.
Answer. [Write your solution here. Add cells as needed.]
(c) Repeat (i), but draw each input value from $N(1, 1)$.
Answer. [Write your solution here. Add cells as needed.]
(d) Repeat (i), but for half of the input vectors, draw each input value from $N(1, 2)$, and for other half of the input vectors, draw each input value from $U[0, 10]$.
Answer. [Write your solution here. Add cells as needed.]
(e) Repeat (i), using ReLU instead of SELU.
Answer. [Write your solution here. Add cells as needed.]