STAT3007 Deep Learning, Prac 7

2022 Semester 1

Q1. LSTM

We build an LSTM model for numerical sequence prediction in this question.

We assume that we are given subsequences taken from the following simple sinusoidal series as the training data.

import matplotlib.pyplot as plt 
    
period = 10
x = np.arange(200)
plt.plot(np.sin(x/period))

We sample 500 subsequences of length 200 from the sine curve: 497 are used for training and 3 are used for testing. The LSMT model is trained to predict the next value in the sequence, and we train it as a sequence to sequence model that map an input sequence to its one-step forward version. The trained model is evaluated by using it to predict 50 future values given the first 150 values in the test sequences.

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

def gen_sin_waves(period, seq_len, num_seq):
    # sample random starting positions between -2*period and 2*period
    start = np.random.randint(-2*period, 2*period, (num_seq, 1))  
    # for each starting position, create a sequence of consecutive time steps
    x = start + np.arange(seq_len)
    # compute the value sequences
    data = np.sin(x / period).astype('float64')
    return data

# set random seeds for reproducibility
np.random.seed(0)
torch.manual_seed(0) 

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# generate value sequences
seq_len = 200
num_seq = 500
data = torch.from_numpy(gen_sin_waves(period, seq_len, num_seq)).double().to(device)

# sequences 4 to 500 are used as the training sequences
data_tr = data[3:,:]
x_train = data_tr[:, :-1]
y_train = data_tr[:, 1:]

# sequences 1 to 3 are used as the test sequences
data_ts = data[:3,:]
x_test = data_ts[:, :150]
y_test = data_ts[:, 150:]

(a) Consider the LSTM network below. Write your code for predicting future values for given inputs.

Answer.

class SinLSTM(nn.Module):
    def __init__(self):
        super(SinLSTM, self).__init__()
        # LSTM cell with 1D input and 50D hidden state
        self.lstm = nn.LSTMCell(1, 50) 
        # output layer takes in the 50D hidden state and output a 1D output 
        self.linear = nn.Linear(50, 1)

    def forward(self, x, future=0): 
        outputs = []
        # initial hidden state and cell state set to 0
        h_t = torch.zeros(x.size(0), 50, dtype=torch.double).to(x.device)
        c_t = torch.zeros(x.size(0), 50, dtype=torch.double).to(x.device)

        # predict outputs for inputs
        for i, input_t in enumerate(x.chunk(x.size(1), dim=1)):
            h_t, c_t = self.lstm(input_t, (h_t, c_t))
            output = self.linear(h_t)
            outputs += [output]

        # predict future
        # write your code here
       
        outputs = torch.stack(outputs, 1).squeeze(2)
        return outputs

(b) Read the training code below.

(i) What is the loss function used? Does the loss converge? If not, what changes do you need to make?

(ii) Add code to compute the trained model’s mean squared error on the test set after every 20 iterations. Plot the model’s predictions on the first test sequence together with the true values.

(iii) How does the test error changes? Are the predictions satisfactory? Can you obtain a better model?

from time import time
from tqdm import tqdm

net = SinLSTM().double().to(device)
criterion = nn.MSELoss()

# initialize optimizer 
optimizer = optim.SGD(net.parameters(), lr=2, momentum=0.9)

# train
loop = tqdm(range(200))
for i in loop:
    t0 = time()
    optimizer.zero_grad()
    out = net(x_train)
    loss = criterion(out, y_train)
    loss.backward()
    optimizer.step()
    
    loop.set_postfix(train_mse='|%7.5f|' % loss.item(), time='|%7.2f|' % (time()-t0))

Answer.