简介

循环神经网络的输出取决于当下输入和前一时间的隐变量
应用到语言模型中时，循环神经网络根据当前词预测下一次时刻词
通常使用困惑度来衡量语言模型的好坏

torch 实现

torch 给出了 rnn 的 api 接口定义

\(x\) 的维度为[seq_len, batch, input_size]
\(x_t\) 的维度为[batch, input_size]
\(h_t\) 的维度为[batch, hidden_len]

这里假设为 x 的 shape 为[10, 3, 100]： - 10 表示 10 个单词 - 3 表示每次训练 3 句话 - 100 表示每个单词用 100 维的 tensor 表达

所以：\(x_t\) 的 shape为[3, 100]。

假设 hidden_len（memory 的维度）为 20，所以每个 \(h_t\) 的维度为[3, 20]。

根据公式下面给出代码：

\[h_t=tanh(x_tW_{ih}^T+b_{ih}+h_{t−1}W_{hh}^T+b_{hh})\]

复现

这里 rnn = nn. RNN (100, 20) ，指的是用一个长度为 100 的向量表示一个单词，hidden_size 为 20。

import torch  
import torch.nn as nn

rnn = nn.RNN(100,20)     # 定义input_size 和 hidden_size
  
print(rnn._parameters.keys())  
  
print(rnn.weight_ih_l0.shape)  
print(rnn.weight_hh_l0.shape)  
print(rnn.bias_ih_l0.shape)  
print(rnn.bias_hh_l0.shape)

输出：

odict_keys (['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])
torch. Size ([20, 100])
torch. Size ([20, 20])
torch. Size ([20])
torch. Size ([20])

对于 input 的定义：

必选参数 input_size，指定输入序列中单个样本的尺寸大小，例如可能用一个 100 长度的向量表示一个单词，则 input_size=100
必选参数 hidden_size，指的是隐藏层中输出特征的大小，假设为 20
必选参数 num_layers，指的是纵向的隐藏层个数，一般设置为 1~10，default=1

如果 num_layers 为 1：

如果 num_layers 为 2：

\[h_t=tanh(x_tW_{ih}^T+b_{ih}+h_{t−1}W_{hh}^T+b_{hh})\]

import torch  
import torch.nn as nn  
  
rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=1)  
x = torch.randn(10, 3, 100)  
out, h_t = rnn(x, torch.zeros(1, 3, 20))  
print(out.shape) 
print(h_t.shape)

输出：

torch. Size ([10, 3, 20])
torch. Size ([1, 3, 20])

对于多层 RNN 堆叠：

import torch  
import torch.nn as nn  
  
rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=4)  
x = torch.randn(10, 3, 100)  
out, h_t = rnn(x)  
print(out.shape)  
print(h_t.shape)

输出：

torch. Size ([10, 3, 20])
torch. Size ([4, 3, 20])

实战

问题背景：有 n 句话，每句话都由且仅由 3 个单词组成。我做的是，将每句话的前两个单词作为输入，最后一词作为输出，利用 pytorch 训练一个 TextRNN。

导包

import torch  
import numpy as np  
import torch.nn as nn  
import torch.optim as optim  
import torch.utils.data as Data  
  
dtype = torch.FloatTensor

数据处理（构建字典和索引）

sentences = ["i like dog", "i love coffee", "i hate milk"]  
word_list = " ".join(sentences).split()  
vocab = list(set(word_list))  
word2idx = {w:i for i, w in enumerate(vocab)}  
idx2word = {i:w for i, w in enumerate(vocab)}  
n_class = len(vocab)

vocab
> ['i', 'dog', 'like', 'milk', 'love', 'hate', 'coffee']
word2idx
> {'i': 0, 'dog': 1, 'like': 2, 'milk': 3, 'love': 4, 'hate': 5, 'coffee': 6}
idx2word
> {0: 'i', 1: 'dog', 2: 'like', 3: 'milk', 4: 'love', 5: 'hate', 6: 'coffee'}
n_class
> 7

DataSet 和 DataLoader

batch_size = 2  
n_step = 2  
n_hidden = 5  
  
def make_data(sentences):  
    input_batch = []  
    target_batch = []  
  
    for sen in sentences:  
        word = sen.split()  
        input = [word2idx[n] for n in word[:-1]]    # 除了最后一个单词  
        target = word2idx[word[-1]]                 # 最后一个单词当作预测值  
        input_batch.append(np.eye(n_class)[input])     
        target_batch.append(target)  
  
    return input_batch, target_batch  
  
input_batch, target_batch = make_data(sentences)  
  
# torch.FloatTensor是32位浮点类型数据，torch.LongTensor是64位整型  
input_batch = torch.Tensor(input_batch)  
target_batch = torch.LongTensor(target_batch)  

dataset = Data.TensorDataset(input_batch, target_batch)  
dataloader = Data.DataLoader(dataset, batch_size, True)

list (dataset)
> [(tensor ([[1., 0., 0., 0., 0., 0., 0.], [0., 0., 1., 0., 0., 0., 0.]]), tensor (1)),
> (tensor ([[1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 0., 0.]]), tensor (6)),
> (tensor ([[1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0.]]), tensor (3))]
list (dataloader)
> [[tensor ([[[1., 0., 0., 0., 0., 0., 0.],[0., 0., 1., 0., 0., 0., 0.]],
> [[1., 0., 0., 0., 0., 0., 0.],[0., 0., 0., 0., 1., 0., 0.]]]),
> tensor ([1, 6])],
> [tensor ([[[1., 0., 0., 0., 0., 0., 0.],[0., 0., 0., 0., 0., 1., 0.]]]),
> tensor ([3])]]

把函数核心部分进行分解：

input_batch = []  
target_batch = []  
  
for sen in sentences:  
    word = sen.split()  
    input = [word2idx[n] for n in word[:-1]]    # 除了最后一个单词，返回的是id  
    target = word2idx[word[-1]]                 # 最后一个单词当作预测值，返回的是id  
    input_batch.append(np.eye(n_class)[input])  
    # print(np.eye(n_class)[input])  
    target_batch.append(target)

input
> [0, 2]
> [0, 4]
> [0, 5]
np. eye(n_class)[input]
> [[1. 0. 0. 0. 0. 0. 0.]
> [0. 0. 1. 0. 0. 0. 0.]]
> [[1. 0. 0. 0. 0. 0. 0.]
> [0. 0. 0. 0. 1. 0. 0.]]
> [[1. 0. 0. 0. 0. 0. 0.]
> [0. 0. 0. 0. 0. 1. 0.]]
input_batch
> [array ([[1., 0., 0., 0., 0., 0., 0.],[0., 0., 1., 0., 0., 0., 0.]]),
> array ([[1., 0., 0., 0., 0., 0., 0.],[0., 0., 0., 0., 1., 0., 0.]]),
> array ([[1., 0., 0., 0., 0., 0., 0.],[0., 0., 0., 0., 0., 1., 0.]])]
target_batch
> [1, 6, 3]

定义网络架构

class TextRNN(nn.Module):  
    def __init__(self):  
        super(TextRNN, self).__init__()  
        self.rnn = nn.RNN(input_size=n_class, hidden_size=n_hidden)  
        self.fc = nn.Linear(n_hidden, n_class)  
    def forward(self, hidden, X):  
        # 最初：输入的X为[batch_size, n_step, n_class]  
        X = X.transpose(0,1)     
        # np.transpose是替换轴，之后X变成变成 [n_step, batch_size, n_class] 
        out, hidden = self.rnn(X, hidden)  
        out = out[-1]  
        model = self.fc(out)  
        return model

首先是 nn.RNN(input_size, hidden_size) 的两个参数： - input_size 表示每个词的编码维度，由于这里用的 one-hot 编码（不是 WordEmbedding），所以 input_size 就等于词库的大小 len(vocab)，即 n_class。 - hidden_size，这个参数没有固定的要求，想将输入数据的维度转为多少维，就设定多少

因为对于一半神经网路而言，输入数据 X 的第一个维度都是 batch_size，但是 pytorch 的 nn.RNN() 要求将 batch_size 放在第二个维度上，所以代码中用 X.transpose(0,1) 将数据的第一个维度和第二个维度互换。

rnn 的输出：rnn 会返回两个结果，即代码中的 out 和 hidden。这里简单说就是，out 指的是下图的红框框起来的所有值；hidden 指的是下图蓝框框起来的所有值。我们需要的是最后时刻的最后一层输出，即 \(Y_3\) 的值，所以使用 out=out[-1] 将其获取。

定义好模型后，就是调用模型、定义优化器 optimizer。

1
2
3

model = TextRNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

训练

epoch = 5000  
  
for e in range(epoch):  
    for x, y in dataloader:  
	    # hidden : [num_layers * num_directions, batch, hidden_size]  
        hidden = torch.zeros(1, x.shape[0], n_hidden)     
        pred = model(hidden, x)      # X : [batch_size, n_step, n_class]  
        loss = criterion(pred, y)  
        
        # if (e + 1) % 1000 == 0:  
        #     print('Epoch:', '%04d' % (e + 1), 'cost =', '{:.6f}'.format(loss))  
        
        optimizer.zero_grad()  
        loss.backward()  
        optimizer.step()

Epoch: 1000 cost = 0.000000
Epoch: 1000 cost = 0.000000
Epoch: 2000 cost = 0.000000
Epoch: 2000 cost = 0.000000
Epoch: 3000 cost = 0.000000
Epoch: 3000 cost = 0.000000
Epoch: 4000 cost = 0.000000
Epoch: 4000 cost = 0.000000
Epoch: 5000 cost = 0.000000
Epoch: 5000 cost = 0.000000

预测

input = [sen.split()[:2] for sen in sentences]  
hidden = torch.zeros(1, len(input), n_hidden)  
predict = model(hidden, input_batch).data.max(1, keepdim=True)[1]  
print([sen.split()[:2] for sen in sentences], '->', [idx2word[n.item()] for n in predict.squeeze()])

input
> [['i', 'like'], ['i', 'love'], ['i', 'hate']]
hidden
> tensor ([[[0., 0., 0., 0., 0.],
> [0., 0., 0., 0., 0.],
> [0., 0., 0., 0., 0.]]])
predict
> tensor ([[2], [4], [1]])
输出：
> [['i', 'like'], ['i', 'love'], ['i', 'hate']] -> ['dog', 'coffee', 'milk']

其他

nn. RNNCell

Pytorch基础系列（4）

简介

torch 实现

复现

实战

导包

数据处理（构建字典和索引）

DataSet 和 DataLoader

定义网络架构

训练

预测

其他

相关链接