当构建神经网络时,我们经常想把计算分配到各个layer,一些可学习参数(learnable parameter)在学习的过程中会被优化。在PyTorch中nn
包提供了计算图(computational graph)的高级抽象,nn
包定义了一系列Module
,和神经网络的layer大致相同,Module接收输入Tensor,计算输出Tensor,同时也保存中间状态,如包含可学习参数的Tensor。nn
包也定义了一系列有用的损失函数。
PyTorch:nn 接着上一篇前向传播和反向传播,下面用nn
包来实现两层神经网络:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 import torchN, D_in, H, D_out = 64 , 1000 , 100 , 10 x = torch.randn(N, D_in) y = torch.randn(N, D_out) model = torch.nn.Sequential( torch.nn.Linear(D_in, H), torch.nn.ReLU(), torch.nn.Linear(H, D_out), ) loss_fn = torch.nn.MSELoss(size_average=False ) learning_rate = 1e-4 for t in range (500 ): y_pred = model(x) loss = loss_fn(y_pred, y) print t, loss.item() model.zero_grad() loss.backward() with torch.no_grad(): for param in model.parameters(): param -= learning_rate * param.grad
PyTorch:optim 到目前为止,我们通过手动调整保存可学习参数的Tensor来更新我们模型的weight,这对简单的优化算法(如随机梯度下降)来说不难,但是在实际情况中,我们经常用更复杂的优化器来训练神经网络,如AdaGrad、RMSProp、Adam等。optim包提供了这些优化算法的实现。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import torchN, D_in, H, D_out = 64 , 1000 , 100 , 10 x = torch.randn(N, D_in) y = torch.randn(N, D_out) model = torch.nn.Sequential( torch.nn.Linear(D_in, H), torch.nn.ReLU(), torch.nn.Linear(H, D_out), ) loss_fn = torch.nn.MSELoss(size_average=False ) learning_rate = 1e-4 optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) for t in range (500 ): y_pred = model(x) loss = loss_fn(y_pred, y) print t, loss.item() optimizer.zero_grad() loss.backward() optimizer.step()
PyTorch:自定义nn模块 有时我们想要定义一个比现有模块更复杂的模型,这时需要继承nn.Module
并定义一个forward
,使用其他模型或自动求导操作来接收输入Tensor,产生输出Tensor。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 import torchclass TwoLayerNet (torch.nn.Module ): def __init__ (self, D_in, H, D_out ): super (TwoLayerNet, self).__init__() self.linear1 = torch.nn.Linear(D_in, H) self.linear2 = torch.nn.Linear(H, D_out) def forward (self, x ): h_relu = self.linear1(x).clamp(min =0 ) y_pred = self.linear2(h_relu) return y_pred N, D_in, H, D_out = 64 , 1000 , 100 , 10 x = torch.randn(N, D_in) y = torch.randn(N, D_out) model = TwoLayerNet(D_in, H, D_out) criterion = torch.nn.MSELoss(size_average=False ) optimizer = torch.optim.SGD(model.parameters(), lr=1e-4 ) for t in range (500 ): y_pred = model(x) loss = criterion(y_pred, y) print t, loss.item() optimizer.zero_grad() loss.backward() optimizer.step()
reference PyTorch nn module