使用最近邻进行图像识别 · TensorFlow 机器学习秘籍中文第二版

# 使用最近邻进行图像识别最近邻也可用于图像识别。图像识别数据集的问题世界是 MNIST 手写数字数据集。由于我们将在后面的章节中将此数据集用于各种神经网络图像识别算法，因此将结果与非神经网络算法进行比较将会很棒。 ## 做好准备 MNIST 数字数据集由数千个尺寸为 28×28 像素的标记图像组成。虽然这被认为是一个小图像，但它对于最近邻算法总共有 784 个像素（或特征）。我们将通过考虑最近的`k`邻居（`k=4`，在该示例中）的模式预测来计算该分类问题的最近邻预测。 ## 操作步骤我们将按如下方式处理秘籍： 1. 我们将从加载必要的库开始。请注意，我们还将导入 Python 图像库（PIL），以便能够绘制预测输出的样本。 TensorFlow 有一个内置方法来加载我们将使用的 MNIST 数据集，如下所示： ```py import random import numpy as np import tensorflow as tf import matplotlib.pyplot as plt from PIL import Image from tensorflow.examples.tutorials.mnist import input_data ``` 1. 现在，我们将启动图会话并以单热编码形式加载 MNIST 数据： ```py sess = tf.Session() mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) ``` > 单热编码是更适合数值计算的分类值的数值表示。这里，我们有 10 个类别（数字 0-9），并将它们表示为长度为 10 的 0-1 向量。例如，0 类别由向量 1,0,0,0,0,0 表示， 0,0,0,0,1 向量用 0,1,0,0,0,0,0,0,0,0 表示，依此类推。 1. 因为 MNIST 数据集很大并且计算数万个输入上的 784 个特征之间的距离在计算上是困难的，所以我们将采样一组较小的图像来训练。此外，我们将选择一个可被 6 整除的测试集编号，仅用于绘图目的，因为我们将绘制最后一批六个图像以查看结果的示例： ```py train_size = 1000 test_size = 102 rand_train_indices = np.random.choice(len(mnist.train.images), train_size, replace=False) rand_test_indices = np.random.choice(len(mnist.test.images), test_size, replace=False) x_vals_train = mnist.train.images[rand_train_indices] x_vals_test = mnist.test.images[rand_test_indices] y_vals_train = mnist.train.labels[rand_train_indices] y_vals_test = mnist.test.labels[rand_test_indices] ``` 1. 我们将声明我们的`k`值和批量大小： ```py k = 4 batch_size=6 ``` 1. 现在，我们将初始化将添加到图中的占位符： ```py x_data_train = tf.placeholder(shape=[None, 784], dtype=tf.float32) x_data_test = tf.placeholder(shape=[None, 784], dtype=tf.float32) y_target_train = tf.placeholder(shape=[None, 10], dtype=tf.float32) y_target_test = tf.placeholder(shape=[None, 10], dtype=tf.float32) ``` 1. 然后我们将声明我们的距离度量。在这里，我们将使用 L1 度量（绝对值）： ```py distance = tf.reduce_sum(tf.abs(tf.subtract(x_data_train, tf.expand_dims(x_data_test,1))), reduction_indices=2) ``` > 请注意，我们也可以使用以下代码来改变距离函数：`distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(x_data_train, tf.expand_dims(x_data_test,1))), reduction_indices=1))`。 1. 现在，我们将找到最接近的顶级`k`图像并预测模式。该模式将在单热编码索引上执行，计数最多： ```py top_k_xvals, top_k_indices = tf.nn.top_k(tf.negative(distance), k=k) prediction_indices = tf.gather(y_target_train, top_k_indices) count_of_predictions = tf.reduce_sum(prediction_indices, reduction_indices=1) prediction = tf.argmax(count_of_predictions) ``` 1. 我们现在可以遍历我们的测试集，计算预测并存储它们，如下所示： ```py num_loops = int(np.ceil(len(x_vals_test)/batch_size)) test_output = [] actual_vals = [] for i in range(num_loops): min_index = i*batch_size max_index = min((i+1)*batch_size,len(x_vals_train)) x_batch = x_vals_test[min_index:max_index] y_batch = y_vals_test[min_index:max_index] predictions = sess.run(prediction, feed_dict={x_data_train: x_vals_train, x_data_test: x_batch, y_target_train: y_vals_train, y_target_test: y_batch}) test_output.extend(predictions) actual_vals.extend(np.argmax(y_batch, axis=1)) ``` 1. 现在我们已经保存了实际和预测的输出，我们可以计算出准确率。由于我们对测试/训练数据集进行随机抽样，这会发生变化，但最终我们的准确率值应该在 80％-90％左右： ```py accuracy = sum([1./test_size for i in range(test_size) if test_output[i]==actual_vals[i]]) print('Accuracy on test set: ' + str(accuracy)) Accuracy on test set: 0.8333333333333325 ``` 1. 以下是绘制前面批量结果的代码： ```py actuals = np.argmax(y_batch, axis=1) Nrows = 2 Ncols = 3 for i in range(len(actuals)): plt.subplot(Nrows, Ncols, i+1) plt.imshow(np.reshape(x_batch[i], [28,28]), cmap='Greys_r') plt.title('Actual: ' + str(actuals[i]) + ' Pred: ' + str(predictions[i]), fontsize=10) frame = plt.gca() frame.axes.get_xaxis().set_visible(False) frame.axes.get_yaxis().set_visible(False) ``` 结果如下： ![](https://img.kancloud.cn/cf/c9/cfc9aae42578fa401dfd0287c86791bb_349x247.png) 图 4：我们运行最近邻预测的最后一批六个图像。我们可以看到，我们并没有完全正确地获得所有图像。 ## 工作原理给定足够的计算时间和计算资源，我们可以使测试和训练集更大。这可能会提高我们的准确率，也是防止过拟合的常用方法。另外，请注意，此算法需要进一步探索理想的`k`值进行选择。可以在数据集上进行一组交叉验证实验后选择`k`值。 ## 更多我们还可以使用最近邻居算法来评估用户看不见的数字。有关使用此模型评估用户输入数字的方法，请参阅在线仓库，地址为 [https://github.com/nfmcclure/tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook) 。在本章中，我们探讨了如何使用 k-NN 算法进行回归和分类。我们讨论了距离函数的不同用法，以及如何将它们混合在一起。我们鼓励读者探索不同的距离度量，权重和`k`值，以优化这些方法的准确率。