神经网络

当构建神经网络时,我们经常想把计算分配到各个layer,一些可学习参数(learnable parameter)在学习的过程中会被优化。在PyTorch中nn包提供了计算图(computational graph)的高级抽象,nn包定义了一系列Module,和神经网络的layer大致相同,Module接收输入Tensor,计算输出Tensor,同时也保存中间状态,如包含可学习参数的Tensor。nn包也定义了一系列有用的损失函数。

PyTorch:nn

接着上一篇前向传播和反向传播,下面用nn包来实现两层神经网络:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#coding:utf-8
import torch

N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# nn.Sequential是一个包含其他Module的模块,按顺序调用它们来产生输出
# 每个Linear模块用线性方程来计算输出,为weight和bias保留中间Tensor
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
# 这里用Mean Squared Error (MSE)作为损失函数
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4
for t in range(500):
y_pred = model(x)
loss = loss_fn(y_pred, y)
print t, loss.item()
model.zero_grad()
loss.backward()
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad

PyTorch:optim

到目前为止,我们通过手动调整保存可学习参数的Tensor来更新我们模型的weight,这对简单的优化算法(如随机梯度下降)来说不难,但是在实际情况中,我们经常用更复杂的优化器来训练神经网络,如AdaGrad、RMSProp、Adam等。optim包提供了这些优化算法的实现。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#coding:utf-8
import torch

N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
y_pred = model(x)
loss = loss_fn(y_pred, y)
print t, loss.item()
#model.zero_grad()
optimizer.zero_grad()
loss.backward()
optimizer.step()

PyTorch:自定义nn模块

有时我们想要定义一个比现有模块更复杂的模型,这时需要继承nn.Module并定义一个forward,使用其他模型或自动求导操作来接收输入Tensor,产生输出Tensor。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#coding:utf-8
import torch

class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)

def forward(self, x):
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred

N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = TwoLayerNet(D_in, H, D_out)

criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

for t in range(500):
y_pred = model(x)
loss = criterion(y_pred, y)
print t, loss.item()
optimizer.zero_grad()
loss.backward()
optimizer.step()

reference PyTorch nn module