Pytorch Basics III

4 minute read

Pytorch Basics III

Building a Linear Model with Autograd

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# Create a toy data set

x_train = np.array([[4.7], [2.4], [7.5], [7.1], [4.3], [7.816],
                    [8.9], [5.2], [8.59], [2.1], [8],
                    [10], [4.5],[6], [4]],
                   dtype = np.float32)

y_train = np.array([[2.6], [1.6], [3.09], [2.4], [2.4], [3.357],
                    [2.6], [1.96], [3.53], [1.76], [3.2],
                    [3.5], [1.6], [2.5], [2.2]],
                   dtype = np.float32)
# Visualize the realtionship between these data

plt.figure(figsize=(12, 8))

plt.scatter(x_train, y_train, label='Original data', s=250, c='b')

plt.legend()

linear_model_img

x_train
array([[ 4.7  ],
       [ 2.4  ],
       [ 7.5  ],
       [ 7.1  ],
       [ 4.3  ],
       [ 7.816],
       [ 8.9  ],
       [ 5.2  ],
       [ 8.59 ],
       [ 2.1  ],
       [ 8.   ],
       [10.   ],
       [ 4.5  ],
       [ 6.   ],
       [ 4.   ]], dtype=float32)
y_train
array([[2.6  ],
       [1.6  ],
       [3.09 ],
       [2.4  ],
       [2.4  ],
       [3.357],
       [2.6  ],
       [1.96 ],
       [3.53 ],
       [1.76 ],
       [3.2  ],
       [3.5  ],
       [1.6  ],
       [2.5  ],
       [2.2  ]], dtype=float32)
# Convert x and y to tensors

X_train = torch.from_numpy(x_train)
Y_train = torch.from_numpy(y_train)

print('requires_grad for X_train: ', X_train.requires_grad)
print('requires_grad for Y_train: ', Y_train.requires_grad)
requires_grad for X_train:  False
requires_grad for Y_train:  False
X_train
tensor([[ 4.7000],
        [ 2.4000],
        [ 7.5000],
        [ 7.1000],
        [ 4.3000],
        [ 7.8160],
        [ 8.9000],
        [ 5.2000],
        [ 8.5900],
        [ 2.1000],
        [ 8.0000],
        [10.0000],
        [ 4.5000],
        [ 6.0000],
        [ 4.0000]])
Y_train
tensor([[2.6000],
        [1.6000],
        [3.0900],
        [2.4000],
        [2.4000],
        [3.3570],
        [2.6000],
        [1.9600],
        [3.5300],
        [1.7600],
        [3.2000],
        [3.5000],
        [1.6000],
        [2.5000],
        [2.2000]])

By default, requires_grad is not enabled.

Declare variables to pass as the neural network parameters.

input_size = 1
hidden_size = 1
output_size = 1

# Construct the neural network manually
# w1 contains the weights for the input to the nn

w1 = torch.rand(input_size,
                hidden_size,
                
                requires_grad=True)

w1.shape

torch.Size([1, 1])

w1.type()

'torch.FloatTensor'

# w2 contains the weights from the hidden layer to the final output

w2 = torch.rand(hidden_size,
                output_size,
                
                requires_grad=True)

w2.shape

torch.Size([1, 1])

w2.type()
#w2.astype(float32)

#w2.set_default_dtype(torch.float64)
#torch.get_default_dtype()

'torch.FloatTensor'

# define learning_rate hyperparameter

learning_rate = 1e-6

The learning_rate determines the step that the model parameters take towards the optimum. When tweaking the model parameters using gradients in the backward pass we multiply the gradients by the learning_rate to determine by how much to tweak the model.

Set up training for the NN.

for iter in range(1, 10):

  y_pred = X_train.mm(w1).mm(w2)
  loss = (y_pred - Y_train).pow(2).sum()

  if iter % 50 == 0:
    print(iter, loss.item())

  loss.backward()

  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad
    w1.grad.zero_()
    w2.grad.zero_()

print('w1: ', w1) 
print('w2: ', w2)
w1:  tensor([[0.8791]], requires_grad=True)
w2:  tensor([[0.5836]], requires_grad=True)
# Perform predictions
x_train_tensor = torch.from_numpy(x_train)
x_train_tensor
tensor([[ 4.7000],
        [ 2.4000],
        [ 7.5000],
        [ 7.1000],
        [ 4.3000],
        [ 7.8160],
        [ 8.9000],
        [ 5.2000],
        [ 8.5900],
        [ 2.1000],
        [ 8.0000],
        [10.0000],
        [ 4.5000],
        [ 6.0000],
        [ 4.0000]])
predicted_in_tensor = x_train_tensor.mm(w1).mm(w2)
predicted_in_tensor
tensor([[2.4114],
        [1.2313],
        [3.8479],
        [3.6427],
        [2.2061],
        [4.0100],
        [4.5662],
        [2.6679],
        [4.4072],
        [1.0774],
        [4.1045],
        [5.1306],
        [2.3088],
        [3.0783],
        [2.0522]], grad_fn=<MmBackward>)
# convert predictions to numpy for visualization
predicted = predicted_in_tensor.detach().numpy()
predicted
array([[2.4113653],
       [1.2313355],
       [3.8479235],
       [3.642701 ],
       [2.206143 ],
       [4.0100493],
       [4.566202 ],
       [2.6678934],
       [4.407155 ],
       [1.0774186],
       [4.1044517],
       [5.1305647],
       [2.308754 ],
       [3.0783389],
       [2.0522258]], dtype=float32)
# visualize predictions

plt.figure(figsize=(12, 8))

plt.scatter(x_train, y_train, label = 'Original data', s=250, c='r')

plt.plot(x_train, predicted, label = 'Fitted line')

plt.legend()

linear_reg_img2

The regression line is not very accurate and that could be because we did not train the model for more epochs. We can increase to 10000 and retrain.

for iter in range(1, 10000):

  y_pred = X_train.mm(w1).mm(w2)
  loss = (y_pred - Y_train).pow(2).sum()

  if iter % 50 == 0:
    print(iter, loss.item())

  loss.backward()

  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad
    w1.grad.zero_()
    w2.grad.zero_()
print('w1: ', w1) 
print('w2: ', w2)
w1:  tensor([[0.8172]], requires_grad=True)
w2:  tensor([[0.4854]], requires_grad=True)
# Perform predictions
x_train_tensor = torch.from_numpy(x_train)
x_train_tensor
tensor([[ 4.7000],
        [ 2.4000],
        [ 7.5000],
        [ 7.1000],
        [ 4.3000],
        [ 7.8160],
        [ 8.9000],
        [ 5.2000],
        [ 8.5900],
        [ 2.1000],
        [ 8.0000],
        [10.0000],
        [ 4.5000],
        [ 6.0000],
        [ 4.0000]])
predicted_in_tensor = x_train_tensor.mm(w1).mm(w2)
predicted_in_tensor
tensor([[1.8643],
        [0.9520],
        [2.9749],
        [2.8163],
        [1.7056],
        [3.1003],
        [3.5302],
        [2.0626],
        [3.4073],
        [0.8330],
        [3.1733],
        [3.9666],
        [1.7850],
        [2.3799],
        [1.5866]], grad_fn=<MmBackward>)
# convert predictions to numpy for visualization
predicted = predicted_in_tensor.detach().numpy()
predicted
array([[1.8642881],
       [0.951977 ],
       [2.974928 ],
       [2.816265 ],
       [1.7056254],
       [3.1002717],
       [3.530248 ],
       [2.0626168],
       [3.4072843],
       [0.8329798],
       [3.1732564],
       [3.9665709],
       [1.7849568],
       [2.3799422],
       [1.5866282]], dtype=float32)
# visualize predictions

plt.figure(figsize=(12, 8))

plt.scatter(x_train, y_train, label = 'Original data', s=250, c='r')

plt.plot(x_train, predicted, label = 'Fitted line')

plt.legend()

lin_reg_img3

Computational Graphs

Pytorch computation graphs are dynamic. The graph is defined as it is executed.

Approaches in Computational Graphs

  • Static – Tensorflow, symbolic programming of NNs
  • Dynamic – Pytorch, imperative programming of NNs

Contrasting Symbolic and Imperative Programming

Symbolic

  • First defined operations then execute
  • Define functions abstractily, no actual computation takes place
  • Computation explicitly compiled before evaluation
  • e.g. Java, C++

In a NN approach:

  • First defined computation then run code
  • Computation first defined using placeholders, not real data
  • Computation explicitly compiled before evaluation
  • Results in a static computation graph

Tensorflow: Define then Run

  • Explicit compile step
  • Compilation converts the graph into executable format
  • Harder to program and debug
  • Less flexible – harder to experiment
  • More restricted, computation graph only shows final results
  • More efficient – easier to optimize

Imperative

  • Execution performed as operations are being defined
  • Code actually executed as the function is defined
  • No explicit compilation step before evaluation
  • e.g. Python

In a NN approach:

  • Computations are run as they are defined
  • Computations are directly performed on real tensors
  • No explicit compilation step before evaluation
  • Results in a dynamic computation graph

Pytorch Define by Run

  • No explicit compile step
  • Graph already in executable format
  • Easier writing and debugging
  • More flexible, easier to experiment
  • Less restricted, intermediate results visible to users
  • Less efficient – harder to optimize

Tags:

Updated: