STAT3007/7007 Deep Learning, Prac 1

2022 Semester 1

Getting Started with Programming for STAT3007/7007¶

We introduce several useful programming tools for this course.

Python: a general purpose programming language designed to be easy to write and read.
Google's Colaboratory (or Colab for short, https://colab.research.google.com/): an online platform that allows you to write, document, and execute your Python code in a document, called a Jupyter Notebook, in your browser.
Jupyter Notebook (https://jupyter.org/) on your computer: you can also create and run Jupyter Notebooks locally on your computer. This is more convenient if you need to manipulate files.
scikit-learn (https://scikit-learn.org/stable/, also known as sklearn): an exellent machine learning library written in Python.
Utilities: some utilities that you may find useful for this course.

Introduction to Python¶

While you are not required to know Python, we assume that you have sufficient programming experience and are able to quickly pick up the language on your own.

Python is designed with ease of reading and writing in mind, and you should be able to comprehend the code in this section with little effort.

Numbers

x = 2
y = 3.
print(x, type(x))
print(y, type(y))
print(x + y, x - y, x * y, x / y, x//y)

2 <class 'int'>
3.0 <class 'float'>
5.0 -1.0 6.0 0.6666666666666666 0.0

Booleans

is_wealthy = True
is_happy = True
is_wealthy & is_happy

True

Strings

s1 = "hello"
s2 = "world"
print(s1 + ' ' + s2)
print('number of letters in ' + s1 + ": " + str(len(s1)))
print('number of letters in %s: %d' % (s1, len(s1)))

hello world
number of letters in hello: 5
number of letters in hello: 5

Lists

colors = ['red', 'green', 'blue']
print('We have', len(colors), 'colors:', colors)
colors.reverse()
print('Reversed list:', colors)
print('yellow appears ', colors.count('yellow'), 'times in the list')
print('green is at position', colors.index('green'))
del colors[1]
print('List after deleting item 1:', colors)

We have 3 colors: ['red', 'green', 'blue']
Reversed list: ['blue', 'green', 'red']
yellow appears  0 times in the list
green is at position 1
List after deleting item 1: ['blue', 'red']

Dictionaries

contacts = {'Taylor Swift': '0711112222', 'George Bush': '0711113333'}
type(contacts)
print('The complete contacts:', contacts)
print('Contact names:', contacts.keys())
print("Taylor Swift's phone number:", contacts['Taylor Swift'])
print("Alex Taylor is a contact:", ('Alex Taylor' in contacts))

The complete contacts: {'Taylor Swift': '0711112222', 'George Bush': '0711113333'}
Contact names: dict_keys(['Taylor Swift', 'George Bush'])
Taylor Swift's phone number: 0711112222
Alex Taylor is a contact: False

Resources

The following are some useful resources that we encourage you to self-study if you need to hone up your Python skills.

The official Python tutorial: https://docs.python.org/3/tutorial/
a Datacamp tutorial: https://www.datacamp.com/courses/intro-to-python-for-data-science

Introduction to Colab¶

To start learning and using Colab, open the URL https://colab.research.google.com/ in your browser. The webpage already contains an introduction to Colab, with links to many additional resources. Log in to your Google account to start writing and executing code using Colab.

Creating your own notebooks¶

We provide a quick start guide on how to create your own notebook starting from scratch below.

Click File on the menu bar on the top left corner, then click New Notebook. You should see an empty notebook, which is simply a document that can contain code and its documentation.
A notebook consists of a sequence of code or text cells: as the name suggests, each code cell contains Python code, and each text cell contains text.
To add a code cell, simply click + Code at the top left corner in your browser. A code cell will be created below your current cursor position. Alternatively, you can move your cursor to be between two cells, then you will see a horizontal line with a + Code button and + Text button at the middle. Click the + Code button and you will have a new cell created between the two existing cells.
Now input print("Hello World!") in a code cell. If you hover your cursor over the cell, a run button appears on the left hand side of the cell, and you can click it to execute your code. You should see Hello World! printed below the cell. Voila! You can use CTRL+ENTER or SHIFT+ENTER to run the code as well - the cursor stays in the same cell for the former, and moves down to the next cell for the latter.
You can create text in the same way, except that you need to use the + Text button.
When you click on a cell, a menu will pop up on the right hand side. You can use the buttons there to move the cell, add a comment, or delete the cell.

Uploading a notebook¶

Click File, then click Upload notebook and then use the popup window to select a file to upload.

For example, you can upload a copy of this notebook to Colab and play with it.

Using markdown to format text¶

The text cells support rich text formatting, and you can create headers, tables, italicized text etc. You can do this by entering plain text in the text cells. However, when the text cells are executed, certain symbols are interpreted as instructions to render the text in a specific way. The markdown language specifies these special symbols and their effects. See https://colab.research.google.com/notebooks/markdown_guide.ipynb for details.

For those of you who are familiar with Jupyter Notebook, note that Colab's markdown syntax does not support HTML tags, which are supported in Jupyter Notebook.

Using GPUs/TPUs¶

Colab gives free access to GPUs and TPUs for free. As compared to CPUs, GPUs and TPUs can allow your deep learning programs to run in a massively parallel way - sometimes this can speed up your program by several orders of magnitude.

To use a GPU or a TPU, click Runtime on the top left corner, then click Change runtime type, then change the hardware accelerator to GPU or TPU (default is None).

If you choose GPU and want to confirm that in your code, input the following code in a code cell and execute it.

import torch
torch.cuda.is_available()

True

The output should be True if GPU is active.

If you choose TPU and want to check whether that's successful, input the following code in a code cell and execute it.

import os
assert os.environ['COLAB_PU_ADDR']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_30248/1665051417.py in <module>
      1 import os
----> 2 assert os.environ['COLAB_PU_ADDR']

~/miniconda3/lib/python3.9/os.py in __getitem__(self, key)
    677         except KeyError:
    678             # raise KeyError with the original key value
--> 679             raise KeyError(key) from None
    680         return self.decodevalue(value)
    681 

KeyError: 'COLAB_PU_ADDR'

If a TPU is active, the assert statement should be successful, and nothing should be printed. Otherwise, you see a KeyError.

Introduction to Jupyter Notebook¶

Now you already have tried out using Jupyter Notebook on Colab. We cover a few tips that will make it more convenient for you to generate a nice-looking report containing both math and code.

Creating and using Jupyter Notebook locally¶

Follow this brief instructions to install Jupyter Notebook on your own computer: https://jupyter.org/install.html.

Typesetting math¶

Jupyter Notebook support writing maths using LaTeX. For example, typing $\beta$ in a text cell gives you $\beta$.

You can type several aligned equations as shown below.

\begin{align}
    E &= m c^{2}, \\
    E - m c^{2} &= 0.
\end{align}

This gives you the output below. \begin{align} E &= m c^{2}, \\ E - m c^{2} &= 0. \end{align}

If you haven't used LaTeX before, you are encouraged to use it when typesetting math. This cheat sheet will be very handy: http://tug.ctan.org/info/undergradmath/undergradmath.pdf.

Displaying files¶

You can display the content of a PDF/image/Word document in Jupyter Notebook using the wand package. Unfortunately, the code below can't be run on Colab due to a permission restriction, so you will need to try it on your computer.

To use wand, follow the official instructions to install it first: https://docs.wand-py.org/en/0.6.5/guide/install.html. Basically, you will need to first install ImageMagick if you don't have it yet, and then install wand. If you already have ImageMagick installed, you don't need to read the official instructions, but can simply run the cell below to install wand.

import sys
!{sys.executable} -m pip install wand

Requirement already satisfied: wand in /home/nanye/miniconda3/lib/python3.9/site-packages (0.6.7)
WARNING: You are using pip version 21.3.1; however, version 22.0.3 is available.
You should consider upgrading via the '/home/nanye/miniconda3/bin/python -m pip install --upgrade pip' command.

With wand, we can define the following utility function for embedding a PDF/image/Word document in a Jupyter Notebook.

from wand.image import Image as WImage

def show_file(filename, pages=[0], scale=1):
    ''' 
    Display selected pages from a file at a chosen scale.
    '''
    for i in pages:
        img = WImage(filename="%s[%d]" % (filename, i), resolution=100)
        img.resize(width=int(scale*img.width), height=int(scale*img.height))
        display(img)

Now we can display the first page of the file prac01.pdf located in the same directory as this notebook. Try increasing the scale to make the displayed content larger.

show_file('prac01.pdf', scale=0.1)

We can display the second and third pages using the pages argument.

show_file('prac01.pdf', pages=[1,2], scale=0.1)

Introduction to sklearn¶

We illustrate how to perform linear regression using sklearn in this section. The modules that we use include

sklearn.datasets: utilities to load various machine learning datasets.
sklearn.model_selection: a collection of utilities for model selection, including splitting datasets, cross-validation.
sklearn.linear_model: a collection of linear models, including OLS, logistic regression.
sklearn.metrics: a collection of performance metrics, including accuracy and MSE.

The following links are useful

A full list of the sklearn modules: https://scikit-learn.org/stable/modules/classes.html.
A user guide: https://scikit-learn.org/stable/user_guide.html.
A gallery of examples: https://scikit-learn.org/stable/auto_examples/index.html.

Linear regression on a random dataset

We create a random linear regression dataset with 1000 examples and 20 features using the make_regression function in the sklearn.datasets module.

from sklearn.datasets import make_regression
import numpy as np

n_samples, n_features = 1000, 20
rng = np.random.RandomState(0)
X, y = make_regression(n_samples, n_features, noise=0.5, random_state=rng)
X.shape, y.shape

((1000, 20), (1000,))

We do a random 70-30 train-test split.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train.shape, X_test.shape

((700, 20), (300, 20))

Now we train and evaluate an ordinary least squares model.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

reg = LinearRegression().fit(X_train, y_train)
print("R2 (train) = ", reg.score(X_train, y_train))
print("MSE (train) = ", mean_squared_error(y_train, reg.predict(X_train)))
print("MSE (test) = ", mean_squared_error(y_test, reg.predict(X_test)))

R2 (train) =  0.9999948405581579
MSE (train) =  0.23212052513349427
MSE (test) =  0.28830797830973615

Utilities¶

Progress bar for your for loops

In the later part of this course, you will implement iterative algorithms. Some may take a long time to execute, and you may want to monitor the progress. The tqdm library is designed for this purpose, as illustrated below.

from tqdm import *
from time import *

for j in range(2):
    progress = tqdm(range(10), desc='Epoch %d' %j) 
    for i in progress:
      sleep(0.1)
      progress.set_postfix(loss=i)

Epoch 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:01<00:00,  9.59it/s, loss=9]
Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:01<00:00,  9.54it/s, loss=9]

Plotting a gallery of images

We will often work images in this course. Sometimes we want to display a collection of images nicely on a panel, possibly with some labels. The following is a handy utility written using the matplotlib library for this purpose.

import sklearn.datasets
import scipy.misc
import matplotlib.pyplot as plt

def plot_gallery(images, titles=None, xscale=1, yscale=1, nrow=3, ncol=6, output=None):
    plt.figure(figsize=(xscale * ncol, yscale * nrow))

    for i in range(nrow * ncol):
        plt.subplot(nrow, ncol, i + 1)
        plt.imshow(images[i])
        if titles is not None:
            # use size and y to adjust font size and position of title
            plt.title(titles[i], size=12, y=-0.2) 
        plt.xticks(())
        plt.yticks(())

    plt.tight_layout()

    if output is not None:
        plt.savefig(output)
    plt.show()

racoon = scipy.misc.face()
plot_gallery([racoon, racoon, racoon], titles=['1', '2', '3'], xscale=3, yscale=3, nrow=1, ncol=3)