Pytorch Basics III
Pytorch Basics III
Building a Linear Model with Autograd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Create a toy data set
x_train = np.array([[4.7], [2.4], [7.5], [7.1], [4.3], [7.816],
[8.9], [5.2], [8.59], [2.1], [8],
[10], [4.5],[6], [4]],
dtype = np.float32)
y_train = np.array([[2.6], [1.6], [3.09], [2.4], [2.4], [3.357],
[2.6], [1.96], [3.53], [1.76], [3.2],
[3.5], [1.6], [2.5], [2.2]],
dtype = np.float32)
# Visualize the realtionship between these data
plt.figure(figsize=(12, 8))
plt.scatter(x_train, y_train, label='Original data', s=250, c='b')
plt.legend()
x_train
array([[ 4.7 ],
[ 2.4 ],
[ 7.5 ],
[ 7.1 ],
[ 4.3 ],
[ 7.816],
[ 8.9 ],
[ 5.2 ],
[ 8.59 ],
[ 2.1 ],
[ 8. ],
[10. ],
[ 4.5 ],
[ 6. ],
[ 4. ]], dtype=float32)
y_train
array([[2.6 ],
[1.6 ],
[3.09 ],
[2.4 ],
[2.4 ],
[3.357],
[2.6 ],
[1.96 ],
[3.53 ],
[1.76 ],
[3.2 ],
[3.5 ],
[1.6 ],
[2.5 ],
[2.2 ]], dtype=float32)
# Convert x and y to tensors
X_train = torch.from_numpy(x_train)
Y_train = torch.from_numpy(y_train)
print('requires_grad for X_train: ', X_train.requires_grad)
print('requires_grad for Y_train: ', Y_train.requires_grad)
requires_grad for X_train: False
requires_grad for Y_train: False
X_train
tensor([[ 4.7000],
[ 2.4000],
[ 7.5000],
[ 7.1000],
[ 4.3000],
[ 7.8160],
[ 8.9000],
[ 5.2000],
[ 8.5900],
[ 2.1000],
[ 8.0000],
[10.0000],
[ 4.5000],
[ 6.0000],
[ 4.0000]])
Y_train
tensor([[2.6000],
[1.6000],
[3.0900],
[2.4000],
[2.4000],
[3.3570],
[2.6000],
[1.9600],
[3.5300],
[1.7600],
[3.2000],
[3.5000],
[1.6000],
[2.5000],
[2.2000]])
By default, requires_grad is not enabled.
Declare variables to pass as the neural network parameters.
input_size = 1
hidden_size = 1
output_size = 1
# Construct the neural network manually
# w1 contains the weights for the input to the nn
w1 = torch.rand(input_size,
hidden_size,
requires_grad=True)
w1.shape
torch.Size([1, 1])
w1.type()
'torch.FloatTensor'
# w2 contains the weights from the hidden layer to the final output
w2 = torch.rand(hidden_size,
output_size,
requires_grad=True)
w2.shape
torch.Size([1, 1])
w2.type()
#w2.astype(float32)
#w2.set_default_dtype(torch.float64)
#torch.get_default_dtype()
'torch.FloatTensor'
# define learning_rate hyperparameter
learning_rate = 1e-6
The learning_rate
determines the step that the model parameters take towards the optimum. When tweaking the model parameters using gradients in the backward pass we multiply the gradients by the learning_rate
to determine by how much to tweak the model.
Set up training for the NN.
for iter in range(1, 10):
y_pred = X_train.mm(w1).mm(w2)
loss = (y_pred - Y_train).pow(2).sum()
if iter % 50 == 0:
print(iter, loss.item())
loss.backward()
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
w1.grad.zero_()
w2.grad.zero_()
print('w1: ', w1)
print('w2: ', w2)
w1: tensor([[0.8791]], requires_grad=True)
w2: tensor([[0.5836]], requires_grad=True)
# Perform predictions
x_train_tensor = torch.from_numpy(x_train)
x_train_tensor
tensor([[ 4.7000],
[ 2.4000],
[ 7.5000],
[ 7.1000],
[ 4.3000],
[ 7.8160],
[ 8.9000],
[ 5.2000],
[ 8.5900],
[ 2.1000],
[ 8.0000],
[10.0000],
[ 4.5000],
[ 6.0000],
[ 4.0000]])
predicted_in_tensor = x_train_tensor.mm(w1).mm(w2)
predicted_in_tensor
tensor([[2.4114],
[1.2313],
[3.8479],
[3.6427],
[2.2061],
[4.0100],
[4.5662],
[2.6679],
[4.4072],
[1.0774],
[4.1045],
[5.1306],
[2.3088],
[3.0783],
[2.0522]], grad_fn=<MmBackward>)
# convert predictions to numpy for visualization
predicted = predicted_in_tensor.detach().numpy()
predicted
array([[2.4113653],
[1.2313355],
[3.8479235],
[3.642701 ],
[2.206143 ],
[4.0100493],
[4.566202 ],
[2.6678934],
[4.407155 ],
[1.0774186],
[4.1044517],
[5.1305647],
[2.308754 ],
[3.0783389],
[2.0522258]], dtype=float32)
# visualize predictions
plt.figure(figsize=(12, 8))
plt.scatter(x_train, y_train, label = 'Original data', s=250, c='r')
plt.plot(x_train, predicted, label = 'Fitted line')
plt.legend()
The regression line is not very accurate and that could be because we did not train the model for more epochs. We can increase to 10000 and retrain.
for iter in range(1, 10000):
y_pred = X_train.mm(w1).mm(w2)
loss = (y_pred - Y_train).pow(2).sum()
if iter % 50 == 0:
print(iter, loss.item())
loss.backward()
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
w1.grad.zero_()
w2.grad.zero_()
print('w1: ', w1)
print('w2: ', w2)
w1: tensor([[0.8172]], requires_grad=True)
w2: tensor([[0.4854]], requires_grad=True)
# Perform predictions
x_train_tensor = torch.from_numpy(x_train)
x_train_tensor
tensor([[ 4.7000],
[ 2.4000],
[ 7.5000],
[ 7.1000],
[ 4.3000],
[ 7.8160],
[ 8.9000],
[ 5.2000],
[ 8.5900],
[ 2.1000],
[ 8.0000],
[10.0000],
[ 4.5000],
[ 6.0000],
[ 4.0000]])
predicted_in_tensor = x_train_tensor.mm(w1).mm(w2)
predicted_in_tensor
tensor([[1.8643],
[0.9520],
[2.9749],
[2.8163],
[1.7056],
[3.1003],
[3.5302],
[2.0626],
[3.4073],
[0.8330],
[3.1733],
[3.9666],
[1.7850],
[2.3799],
[1.5866]], grad_fn=<MmBackward>)
# convert predictions to numpy for visualization
predicted = predicted_in_tensor.detach().numpy()
predicted
array([[1.8642881],
[0.951977 ],
[2.974928 ],
[2.816265 ],
[1.7056254],
[3.1002717],
[3.530248 ],
[2.0626168],
[3.4072843],
[0.8329798],
[3.1732564],
[3.9665709],
[1.7849568],
[2.3799422],
[1.5866282]], dtype=float32)
# visualize predictions
plt.figure(figsize=(12, 8))
plt.scatter(x_train, y_train, label = 'Original data', s=250, c='r')
plt.plot(x_train, predicted, label = 'Fitted line')
plt.legend()
Computational Graphs
Pytorch computation graphs are dynamic. The graph is defined as it is executed.
Approaches in Computational Graphs
- Static – Tensorflow, symbolic programming of NNs
- Dynamic – Pytorch, imperative programming of NNs
Contrasting Symbolic and Imperative Programming
Symbolic
- First defined operations then execute
- Define functions abstractily, no actual computation takes place
- Computation explicitly compiled before evaluation
- e.g. Java, C++
In a NN approach:
- First defined computation then run code
- Computation first defined using placeholders, not real data
- Computation explicitly compiled before evaluation
- Results in a static computation graph
Tensorflow: Define then Run
- Explicit compile step
- Compilation converts the graph into executable format
- Harder to program and debug
- Less flexible – harder to experiment
- More restricted, computation graph only shows final results
- More efficient – easier to optimize
Imperative
- Execution performed as operations are being defined
- Code actually executed as the function is defined
- No explicit compilation step before evaluation
- e.g. Python
In a NN approach:
- Computations are run as they are defined
- Computations are directly performed on real tensors
- No explicit compilation step before evaluation
- Results in a dynamic computation graph
Pytorch Define by Run
- No explicit compile step
- Graph already in executable format
- Easier writing and debugging
- More flexible, easier to experiment
- Less restricted, intermediate results visible to users
- Less efficient – harder to optimize