Pytorch Basics III

4 minute read

Pytorch Basics III

Building a Linear Model with Autograd

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# Create a toy data set

x_train = np.array([[4.7], [2.4], [7.5], [7.1], [4.3], [7.816],
                    [8.9], [5.2], [8.59], [2.1], [8],
                    [10], [4.5],[6], [4]],
                   dtype = np.float32)

y_train = np.array([[2.6], [1.6], [3.09], [2.4], [2.4], [3.357],
                    [2.6], [1.96], [3.53], [1.76], [3.2],
                    [3.5], [1.6], [2.5], [2.2]],
                   dtype = np.float32)

# Visualize the realtionship between these data

plt.figure(figsize=(12, 8))

plt.scatter(x_train, y_train, label='Original data', s=250, c='b')

plt.legend()

linear_model_img

x_train

array([[ 4.7  ],
       [ 2.4  ],
       [ 7.5  ],
       [ 7.1  ],
       [ 4.3  ],
       [ 7.816],
       [ 8.9  ],
       [ 5.2  ],
       [ 8.59 ],
       [ 2.1  ],
       [ 8.   ],
       [10.   ],
       [ 4.5  ],
       [ 6.   ],
       [ 4.   ]], dtype=float32)

y_train

array([[2.6  ],
       [1.6  ],
       [3.09 ],
       [2.4  ],
       [2.4  ],
       [3.357],
       [2.6  ],
       [1.96 ],
       [3.53 ],
       [1.76 ],
       [3.2  ],
       [3.5  ],
       [1.6  ],
       [2.5  ],
       [2.2  ]], dtype=float32)

# Convert x and y to tensors

X_train = torch.from_numpy(x_train)
Y_train = torch.from_numpy(y_train)

print('requires_grad for X_train: ', X_train.requires_grad)
print('requires_grad for Y_train: ', Y_train.requires_grad)

requires_grad for X_train:  False
requires_grad for Y_train:  False

X_train

tensor([[ 4.7000],
        [ 2.4000],
        [ 7.5000],
        [ 7.1000],
        [ 4.3000],
        [ 7.8160],
        [ 8.9000],
        [ 5.2000],
        [ 8.5900],
        [ 2.1000],
        [ 8.0000],
        [10.0000],
        [ 4.5000],
        [ 6.0000],
        [ 4.0000]])

Y_train

tensor([[2.6000],
        [1.6000],
        [3.0900],
        [2.4000],
        [2.4000],
        [3.3570],
        [2.6000],
        [1.9600],
        [3.5300],
        [1.7600],
        [3.2000],
        [3.5000],
        [1.6000],
        [2.5000],
        [2.2000]])

By default, requires_grad is not enabled.

Declare variables to pass as the neural network parameters.

input_size = 1
hidden_size = 1
output_size = 1

# Construct the neural network manually
# w1 contains the weights for the input to the nn

w1 = torch.rand(input_size,
                hidden_size,
                
                requires_grad=True)

w1.shape

torch.Size([1, 1])

w1.type()

'torch.FloatTensor'

# w2 contains the weights from the hidden layer to the final output

w2 = torch.rand(hidden_size,
                output_size,
                
                requires_grad=True)

w2.shape

torch.Size([1, 1])

w2.type()
#w2.astype(float32)

#w2.set_default_dtype(torch.float64)
#torch.get_default_dtype()

'torch.FloatTensor'

# define learning_rate hyperparameter

learning_rate = 1e-6

The learning_rate determines the step that the model parameters take towards the optimum. When tweaking the model parameters using gradients in the backward pass we multiply the gradients by the learning_rate to determine by how much to tweak the model.

Set up training for the NN.

for iter in range(1, 10):

  y_pred = X_train.mm(w1).mm(w2)
  loss = (y_pred - Y_train).pow(2).sum()

  if iter % 50 == 0:
    print(iter, loss.item())

  loss.backward()

  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad
    w1.grad.zero_()
    w2.grad.zero_()

print('w1: ', w1) 
print('w2: ', w2)

w1:  tensor([[0.8791]], requires_grad=True)
w2:  tensor([[0.5836]], requires_grad=True)

# Perform predictions
x_train_tensor = torch.from_numpy(x_train)
x_train_tensor

tensor([[ 4.7000],
        [ 2.4000],
        [ 7.5000],
        [ 7.1000],
        [ 4.3000],
        [ 7.8160],
        [ 8.9000],
        [ 5.2000],
        [ 8.5900],
        [ 2.1000],
        [ 8.0000],
        [10.0000],
        [ 4.5000],
        [ 6.0000],
        [ 4.0000]])

predicted_in_tensor = x_train_tensor.mm(w1).mm(w2)
predicted_in_tensor

tensor([[2.4114],
        [1.2313],
        [3.8479],
        [3.6427],
        [2.2061],
        [4.0100],
        [4.5662],
        [2.6679],
        [4.4072],
        [1.0774],
        [4.1045],
        [5.1306],
        [2.3088],
        [3.0783],
        [2.0522]], grad_fn=<MmBackward>)

# convert predictions to numpy for visualization
predicted = predicted_in_tensor.detach().numpy()
predicted

array([[2.4113653],
       [1.2313355],
       [3.8479235],
       [3.642701 ],
       [2.206143 ],
       [4.0100493],
       [4.566202 ],
       [2.6678934],
       [4.407155 ],
       [1.0774186],
       [4.1044517],
       [5.1305647],
       [2.308754 ],
       [3.0783389],
       [2.0522258]], dtype=float32)

# visualize predictions

plt.figure(figsize=(12, 8))

plt.scatter(x_train, y_train, label = 'Original data', s=250, c='r')

plt.plot(x_train, predicted, label = 'Fitted line')

plt.legend()

linear_reg_img2

The regression line is not very accurate and that could be because we did not train the model for more epochs. We can increase to 10000 and retrain.

for iter in range(1, 10000):

  y_pred = X_train.mm(w1).mm(w2)
  loss = (y_pred - Y_train).pow(2).sum()

  if iter % 50 == 0:
    print(iter, loss.item())

  loss.backward()

  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad
    w1.grad.zero_()
    w2.grad.zero_()

print('w1: ', w1) 
print('w2: ', w2)

w1:  tensor([[0.8172]], requires_grad=True)
w2:  tensor([[0.4854]], requires_grad=True)

# Perform predictions
x_train_tensor = torch.from_numpy(x_train)
x_train_tensor

tensor([[ 4.7000],
        [ 2.4000],
        [ 7.5000],
        [ 7.1000],
        [ 4.3000],
        [ 7.8160],
        [ 8.9000],
        [ 5.2000],
        [ 8.5900],
        [ 2.1000],
        [ 8.0000],
        [10.0000],
        [ 4.5000],
        [ 6.0000],
        [ 4.0000]])

predicted_in_tensor = x_train_tensor.mm(w1).mm(w2)
predicted_in_tensor

tensor([[1.8643],
        [0.9520],
        [2.9749],
        [2.8163],
        [1.7056],
        [3.1003],
        [3.5302],
        [2.0626],
        [3.4073],
        [0.8330],
        [3.1733],
        [3.9666],
        [1.7850],
        [2.3799],
        [1.5866]], grad_fn=<MmBackward>)

# convert predictions to numpy for visualization
predicted = predicted_in_tensor.detach().numpy()
predicted

array([[1.8642881],
       [0.951977 ],
       [2.974928 ],
       [2.816265 ],
       [1.7056254],
       [3.1002717],
       [3.530248 ],
       [2.0626168],
       [3.4072843],
       [0.8329798],
       [3.1732564],
       [3.9665709],
       [1.7849568],
       [2.3799422],
       [1.5866282]], dtype=float32)

# visualize predictions

plt.figure(figsize=(12, 8))

plt.scatter(x_train, y_train, label = 'Original data', s=250, c='r')

plt.plot(x_train, predicted, label = 'Fitted line')

plt.legend()

lin_reg_img3

Computational Graphs

Pytorch computation graphs are dynamic. The graph is defined as it is executed.

Approaches in Computational Graphs

Static – Tensorflow, symbolic programming of NNs
Dynamic – Pytorch, imperative programming of NNs

Contrasting Symbolic and Imperative Programming

Symbolic

First defined operations then execute
Define functions abstractily, no actual computation takes place
Computation explicitly compiled before evaluation
e.g. Java, C++

In a NN approach:

First defined computation then run code
Computation first defined using placeholders, not real data
Computation explicitly compiled before evaluation
Results in a static computation graph

Tensorflow: Define then Run

Explicit compile step
Compilation converts the graph into executable format
Harder to program and debug
Less flexible – harder to experiment
More restricted, computation graph only shows final results
More efficient – easier to optimize

Imperative

Execution performed as operations are being defined
Code actually executed as the function is defined
No explicit compilation step before evaluation
e.g. Python

In a NN approach:

Computations are run as they are defined
Computations are directly performed on real tensors
No explicit compilation step before evaluation
Results in a dynamic computation graph

Pytorch Define by Run

No explicit compile step
Graph already in executable format
Easier writing and debugging
More flexible, easier to experiment
Less restricted, intermediate results visible to users
Less efficient – harder to optimize

Share on

Twitter Facebook Google+ LinkedIn

Gabe Maldonado

Pytorch Basics III