STAT3007/7007 Deep Learning, Prac 2

2022 Semester 1

Q1. Ridge regression

(a) Load the Boston house price dataset in sklearn, and construct a 80-20 train-test split.

Answer. [Write your solution here. Add cells as needed.]

(b) Use Numpy to fit a ridge regression model with $\lambda = 0.1$. Show the model parameters, and calculate its training and test MAEs (mean absolute error).

Answer. [Write your solution here. Add cells as needed.]

(c) Read the documentation of sklearn.linear_model.Ridge, and use it to fit the same ridge regression model. Do you obtain the same model parameters?

Answer. [Write your solution here. Add cells as needed.]

Q2. PCA

We will work with a faces dataset provided in the scikit-learn library, namely, the Olivetti dataset. The API for loading this dataset can be found at https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_olivetti_faces.html#sklearn.datasets.fetch_olivetti_faces. You may want to use sklearn.decomposition.PCA class to answer these questions. See https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA. html.

(a) Load the dataset. How many face images are there in the dataset? What is the size of each image?

Answer. [Write your solution here. Add cells as needed.]

(b) Use matplotlib.pyplot.imshow to display the first five images in the dataset.

Answer. [Write your solution here. Add cells as needed.]

(c) Find the top 5 eigenfaces for this dataset, and display them.

Answer. [Write your solution here. Add cells as needed.]

(d) Compute the pairwise dot product between the eigenfaces, and show the results as a 5x5 matrix with the (i, j)-th entry being the dot product between the i-th and j-th eigenfaces.

Answer. [Write your solution here. Add cells as needed.]

(e) What are the variances of the projections of all the face images on each of the top 5 eigenfaces?

Answer. [Write your solution here. Add cells as needed.]