堆叠多个 LSTM 层 · TensorFlow 机器学习秘籍中文第二版

# 堆叠多个 LSTM 层正如我们可以增加神经网络或 CNN 的深度，我们可以增加 RNN 网络的深度。在这个秘籍中，我们应用了一个三层深度的 LSTM 来改进我们的莎士比亚语言生成。 ## 做好准备我们可以通过将它们叠加在一起来增加循环神经网络的深度。从本质上讲，我们将获取目标输出并将其输入另一个网络。要了解这对于两层的工作原理，请参见下图： ![](https://img.kancloud.cn/8f/f1/8ff1b4d13ad49da169ff3d0a1646a759_971x395.png) 图 5：在上图中，我们扩展了单层 RNN，使它们具有两层。对于原始的单层版本，请参阅上一章简介中的绘图。左侧架构说明了使用多层 RNN 预测输出序列中的一个输出的方法。正确的架构显示了使用多层 RNN 预测输出序列的方法，该输出序列使用输出作为输入 TensorFlow 允许使用`MultiRNNCell()`函数轻松实现多个层，该函数接受 RNN 单元列表。有了这种行为，很容易用`MultiRNNCell([rnn_cell(num_units) for n in num_layers])`单元格从 Python 中的一个单元格创建多层 RNN。对于这个秘籍，我们将执行我们在之前的秘籍中执行的相同的莎士比亚预测。将有两个变化：第一个变化将是具有三个堆叠的 LSTM 模型而不是仅一个层，第二个变化将是进行字符级预测而不是单词。进行字符级预测会将我们潜在的词汇量大大减少到只有 40 个字符（26 个字母，10 个数字，1 个空格和 3 个特殊字符）。 ## 操作步骤我们将说明本节中的代码与上一节的不同之处，而不是重新使用所有相同的代码。有关完整代码，请参阅 [https://github.com/nfmcclure/tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook) 上的 GitHub 仓库或 [https://github.com/PacktPublishing/TensorFlow-Machine-的 Packt 仓库 Learning-Cookbook-Second-Edition](https://github.com/PacktPublishing/TensorFlow-Machine-Learning-Cookbook-Second-Edition) 。 1. 我们首先需要设置模型的层数。我们将此作为参数放在脚本的开头，并使用其他模型参数： ```py num_layers = 3 min_word_freq = 5 ``` ```py rnn_size = 128 epochs = 10 ``` 1. 第一个主要变化是我们将按字符加载，处理和提供文本，而不是按字词加载。为了实现这一点，在清理文本之后，我们可以使用 Python 的`list()`命令逐个字符地分隔整个文本： ```py s_text = re.sub(r'[{}]'.format(punctuation), ' ', s_text) s_text = re.sub('s+', ' ', s_text ).strip().lower() # Split up by characters char_list = list(s_text) ``` 1. 我们现在需要更改 LSTM 模型，使其具有多个层。我们接受`num_layers`变量并使用 TensorFlow 的`MultiRNNCell()`函数创建一个多层 RNN 模型，如下所示： ```py class LSTM_Model(): def __init__(self, rnn_size, num_layers, batch_size, learning_rate, training_seq_len, vocab_size, infer_sample=False): self.rnn_size = rnn_size self.num_layers = num_layers self.vocab_size = vocab_size self.infer_sample = infer_sample self.learning_rate = learning_rate ... self.lstm_cell = tf.contrib.rnn.BasicLSTMCell(rnn_size) self.lstm_cell = tf.contrib.rnn.MultiRNNCell([self.lstm_cell for _ in range(self.num_layers)]) self.initial_state = self.lstm_cell.zero_state(self.batch_size, tf.float32) self.x_data = tf.placeholder(tf.int32, [self.batch_size, self.training_seq_len]) self.y_output = tf.placeholder(tf.int32, [self.batch_size, self.training_seq_len]) ``` > 请注意，TensorFlow 的`MultiRNNCell()`函数接受 RNN 单元列表。在这个项目中，RNN 层都是相同的，但您可以列出您希望堆叠在一起的任何 RNN 层。 1. 其他一切基本相同。在这里，我们可以看到一些训练输出： ```py Building Shakespeare Vocab by Characters Vocabulary Length = 40 Starting Epoch #1 of 10 Iteration: 9430, Epoch: 10, Batch: 889 out of 950, Loss: 1.54 Iteration: 9440, Epoch: 10, Batch: 899 out of 950, Loss: 1.46 Iteration: 9450, Epoch: 10, Batch: 909 out of 950, Loss: 1.49 thou art more than the to be or not to the serva wherefore art thou dost thou Iteration: 9460, Epoch: 10, Batch: 919 out of 950, Loss: 1.41 Iteration: 9470, Epoch: 10, Batch: 929 out of 950, Loss: 1.45 Iteration: 9480, Epoch: 10, Batch: 939 out of 950, Loss: 1.59 Iteration: 9490, Epoch: 10, Batch: 949 out of 950, Loss: 1.42 ``` 1. 以下是最终文本输出的示例： ```py thou art more fancy with to be or not to be for be wherefore art thou art thou ``` 1. 最后，以下是我们如何绘制几代的训练损失： ```py plt.plot(train_loss, 'k-') plt.title('Sequence to Sequence Loss') plt.xlabel('Generation') plt.ylabel('Loss') plt.show() ``` ![](https://img.kancloud.cn/25/db/25db8b416ffe9e71ef166d684211fa55_399x281.png) 图 6：多层 LSTM 莎士比亚模型的训练损失与世代的关系图 ## 工作原理 TensorFlow 只需一个 RNN 单元列表即可轻松将 RNN 层扩展到多个层。对于这个秘籍，我们使用与上一个秘籍相同的莎士比亚数据，但是用字符而不是单词处理它。我们通过三层 LSTM 模型来生成莎士比亚文本。我们可以看到，在仅仅 10 个周期之后，我们就能够以文字的形式产生古老的英语。