用于 MNIST 分类的基于 TensorFlow 的 MLP · 精通 TensorFlow 1.x

# 用于 MNIST 分类的基于 TensorFlow 的 MLP 首先，加载 MNIST 数据集，并使用以下代码定义训练和测试特征以及目标： ```py from tensorflow.examples.tutorials.mnist import input_data mnist_home = os.path.join(datasetslib.datasets_root, 'mnist') mnist = input_data.read_data_sets(mnist_home, one_hot=True) X_train = mnist.train.images X_test = mnist.test.images Y_train = mnist.train.labels Y_test = mnist.test.labels num_outputs = 10 # 0-9 digits num_inputs = 784 # total pixels ``` 我们创建了三个辅助函数，它们将帮助我们创建一个只有一个隐藏层的简单 MLP，然后是一个更大的 MLP，每层有多个层和多个神经元。 `mlp()`函数使用以下逻辑构建网络层： 1. `mlp()`函数需要五个输入： * `x`是输入特征张量 * `num_inputs`是输入特征的数量 * `num_outputs`是输出目标的数量 * `num_layers`是所需隐藏层数 * `num_neurons`是包含每层神经元数量的列表 2. 将权重和偏差列表设置为空： ```py w=[] b=[] ``` 1. 为隐藏层的数量运行循环以创建权重和偏移张量并将它们附加到各自的列表： * 张量分别名为`w_<layer_num>`和`b_<layer_num>`。命名张量有助于调试和查找代码问题。 * 使用`tf.random_normal()`以正态分布初始化张量。 * 权重张量的第一个维度是来自前一层的输入数量。对于第一个隐藏层，第一个维度是`num_inputs`。权重张量的第二维是当前层中的神经元的数量。 * 偏差都是一维张量，其中维度等于当前层中的神经元数量。 ```py for i in range(num_layers): # weights w.append(tf.Variable(tf.random_normal( [num_inputs if i == 0 else num_neurons[i - 1], num_neurons[i]]), name="w_{0:04d}".format(i) )) # biases b.append(tf.Variable(tf.random_normal( [num_neurons[i]]), name="b_{0:04d}".format(i) )) ``` 1. 为最后一个隐藏层创建权重和偏差。在这种情况下，权重张量的维度等于最后隐藏层中的神经元数量和输出目标的数量。偏差是一个张量，具有输出特征数量大小的单一维度： ```py w.append(tf.Variable(tf.random_normal( [num_neurons[num_layers - 1] if num_layers > 0 else num_inputs, num_outputs]), name="w_out")) b.append(tf.Variable(tf.random_normal([num_outputs]), name="b_out")) ``` 1. 现在开始定义层。首先，将`x`视为第一个最明显的输入层： ```py # x is input layer layer = x ``` 1. 在循环中添加隐藏的层。每个隐藏层表示通过激活函数`tf.nn.relu()`使线性函数`tf.matmul(layer, w[i]) + b[i]`非线性化： ```py # add hidden layers for i in range(num_layers): layer = tf.nn.relu(tf.matmul(layer, w[i]) + b[i]) ``` 1. 添加输出层。输出层和隐藏层之间的一个区别是输出层中没有激活函数： ```py layer = tf.matmul(layer, w[num_layers]) + b[num_layers] ``` 1. 返回包含 MLP 网络的`layer`对象： ```py return layer ``` 整个 MLP 函数的完整代码如下： ```py def mlp(x, num_inputs, num_outputs, num_layers, num_neurons): w = [] b = [] for i in range(num_layers): # weights w.append(tf.Variable(tf.random_normal( [num_inputs if i == 0 else num_neurons[i - 1], num_neurons[i]]), name="w_{0:04d}".format(i) )) # biases b.append(tf.Variable(tf.random_normal( [num_neurons[i]]), name="b_{0:04d}".format(i) )) w.append(tf.Variable(tf.random_normal( [num_neurons[num_layers - 1] if num_layers > 0 else num_inputs, num_outputs]), name="w_out")) b.append(tf.Variable(tf.random_normal([num_outputs]), name="b_out")) # x is input layer layer = x # add hidden layers for i in range(num_layers): layer = tf.nn.relu(tf.matmul(layer, w[i]) + b[i]) # add output layer layer = tf.matmul(layer, w[num_layers]) + b[num_layers] return layer ``` 辅助函数`mnist_batch_func()`为 MNIST 数据集包装 TensorFlow 的批量函数，以提供下一批图像： ```py def mnist_batch_func(batch_size=100): X_batch, Y_batch = mnist.train.next_batch(batch_size) return [X_batch, Y_batch] ``` 此函数不言自明。 TensorFlow 为 MNIST 数据集提供此函数;但是，对于其他数据集，我们可能必须编写自己的批量函数。辅助函数`tensorflow_classification()`训练并评估模型。 1. `tensorflow_classification()` 函数有几个输入： * * `n_epochs`是要运行的训练循环的数量 * `n_batches`是应该运行每个循环中的训练的随机抽样批次的数量 * `batch_size`是每批中的样品数 * `batch_func`是`batch_size`并返回`X`和`Y`样本批次的函数 * `model`是具有神经元的实际神经网络或层 * `optimizer`是使用 TensorFlow 定义的优化函数 * `loss`是优化器优化参数的成本函数损失 * `accuracy_function`是计算准确率分数的函数 * `X_test`和`Y_test`是测试的数据集 1. 启动 TensorFlow 会话以运行训练循环： ```py with tf.Session() as tfs: tf.global_variables_initializer().run() ``` 1. 运行`n_epoch`循环的训练： ```py for epoch in range(n_epochs): ``` 1. 在每个循环中，取样本集的`n_batches`数量并训练模型，计算每批的损失，计算每个周期的平均损失： ```py epoch_loss = 0.0 for batch in range(n_batches): X_batch, Y_batch = batch_func(batch_size) feed_dict = {x: X_batch, y: Y_batch} _, batch_loss = tfs.run([optimizer, loss], feed_dict) epoch_loss += batch_loss average_loss = epoch_loss / n_batches print("epoch: {0:04d} loss = {1:0.6f}".format( epoch, average_loss)) ``` 1. 完成所有周期循环后，计算并打印用`accuracy_function`计算的精度分数： ```py feed_dict = {x: X_test, y: Y_test} accuracy_score = tfs.run(accuracy_function, feed_dict=feed_dict) print("accuracy={0:.8f}".format(accuracy_score)) ``` `tensorflow_classification()`函数的完整代码如下： ```py def tensorflow_classification(n_epochs, n_batches, batch_size, batch_func, model, optimizer, loss, accuracy_function, X_test, Y_test): with tf.Session() as tfs: tfs.run(tf.global_variables_initializer()) for epoch in range(n_epochs): epoch_loss = 0.0 for batch in range(n_batches): X_batch, Y_batch = batch_func(batch_size) feed_dict = {x: X_batch, y: Y_batch} _, batch_loss = tfs.run([optimizer, loss], feed_dict) epoch_loss += batch_loss average_loss = epoch_loss / n_batches print("epoch: {0:04d} loss = {1:0.6f}".format( epoch, average_loss)) feed_dict = {x: X_test, y: Y_test} accuracy_score = tfs.run(accuracy_function, feed_dict=feed_dict) print("accuracy={0:.8f}".format(accuracy_score)) ``` 现在让我们定义输入和输出占位符，`x`和`y`以及其他超参数： ```py # input images x = tf.placeholder(dtype=tf.float32, name="x", shape=[None, num_inputs]) # target output y = tf.placeholder(dtype=tf.float32, name="y", shape=[None, num_outputs]) num_layers = 0 num_neurons = [] learning_rate = 0.01 n_epochs = 50 batch_size = 100 n_batches = int(mnist.train.num_examples/batch_size) ``` 参数如下所述： * `num_layers`是隐藏层数。我们首先练习没有隐藏层，只有输入和输出层。 * `num_neurons`是空列表，因为没有隐藏层。 * `learning_rate`是 0.01，随机选择的小数。 * `num_epochs`代表 50 次迭代，以学习将输入连接到输出的唯一神经元的参数。 * `batch_size`保持在 100，这也是一个选择问题。较大的批量大小不一定提供更高的好处。您可能需要探索不同的批量大小，以找到神经网络的最佳批量大小。 * `n_batches`：批次数大致计算为示例数除以批次中的样本数。现在让我们将所有内容放在一起，使用到目前为止定义的变量定义网络，`loss`函数，`optimizer`函数和`accuracy`函数。 ```py model = mlp(x=x, num_inputs=num_inputs, num_outputs=num_outputs, num_layers=num_layers, num_neurons=num_neurons) loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=y)) optimizer = tf.train.GradientDescentOptimizer( learning_rate=learning_rate).minimize(loss) predictions_check = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1)) accuracy_function = tf.reduce_mean(tf.cast(predictions_check, tf.float32)) ``` 在这段代码中，我们使用一个新的 tensorflow 函数来定义损失函数： ```py tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=y) ``` 当使用`softmax_cross_entropy_with_logits()`函数时，请确保输出未缩放且尚未通过`softmax`激活函数。此函数在内部使用`softmax`来缩放输出。该函数计算模型之间的 softmax 熵（估计值`y`）和`y`的实际值。当输出属于一个类而不是一个类时，使用熵函数。在我们的示例中，图像只能属于其中一个数字。有关此熵函数的更多信息，请参阅 [https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits). 一旦定义了所有内容，运行 `tensorflow_classification` 函数来训练和评估模型： ```py tensorflow_classification(n_epochs=n_epochs, n_batches=n_batches, batch_size=batch_size, batch_func=mnist_batch_func, model = model, optimizer = optimizer, loss = loss, accuracy_function = accuracy_function, X_test = mnist.test.images, Y_test = mnist.test.labels ) ``` 我们从运行分类得到以下输出： ```py epoch: 0000 loss = 8.364567 epoch: 0001 loss = 4.347608 epoch: 0002 loss = 3.085622 epoch: 0003 loss = 2.468341 epoch: 0004 loss = 2.099220 epoch: 0005 loss = 1.853206 --- Epoch 06 to 45 output removed for brevity --- epoch: 0046 loss = 0.684285 epoch: 0047 loss = 0.678972 epoch: 0048 loss = 0.673685 epoch: 0049 loss = 0.668717 accuracy=0.85720009 ``` 我们看到单个神经元网络在 50 次迭代中缓慢地将损失从 8.3 降低到 0.66，最终得到几乎 85%的准确率。对于这个具体的例子，这是非常糟糕的准确性，因为这只是使用 TensorFlow 进行分类使用 MLP 的演示。我们使用更多层和神经元运行相同的代码，并获得以下准确性： | 层数 | 每个隐藏层中的神经元数量 | 准确性 | | --- | --- | --- | | 0 | 0 | 0.857 | | 1 | 8 | 0.616 | | 2 | 256 | 0.936 | 因此，通过在每层添加两行和 256 个神经元，我们将精度提高到 0.936。我们鼓励您尝试使用不同变量值的代码来观察它如何影响损失和准确性。