5.2 DeepLab · 手把手教你机器学习

## 把分类网络修改为语义分割的网络存在下面问题： 1. 分辨率变小(the reduction of signal resolution incurred by the repeated combination of max-pooling and downsampling (‘striding’) performed at every layer of standard DCNNs) 2. spatial ‘insensitivity’ (invariance)。（relates to the fact that obtaining object-centric decisions from a classifier requires invariance to spatial transformations, inherently limiting the spatial accuracy of the DCNN model）-- This is due to the very invariance properties that make DCNNs good for high level tasks ***** 我们来看FCN怎么来解决这两个问题的？ ### 第一个问题解决思路：使用了称作空洞卷积的结构，且去除了池化层结构。 ![](https://pic4.zhimg.com/80/v2-d1b8ab65498ecd1d8b193583a2321027_hd.png) 空洞卷积，当比率为1时，即为经典的卷积结构。池化操作增大了感受野，有助于实现分类网络。同时保证了分类的精度因此，该论文所提出的空洞卷积层是如此工作的： ![](https://box.kancloud.cn/29db8216a3439eaadb2cc58c4db7b84b_395x381.png) ### 第二个问题解决思路： **条件随机场(Conditional Random Field，CRF)方法通常在后期处理中用于改进分割效果**。CRF方法是一种基于底层图像像素强度进行“平滑”分割的图模型，在运行时会将像素强度相似的点标记为同一类别。加入条件随机场方法可以提高1~2%的最终评分值。 ![](https://pic3.zhimg.com/80/v2-3fb54c709948f3ca06c47f3171965166_hd.png) 发展中的CRF方法效果。b图中将一维分类器作为CRF方法的分割输入；c、d、e图为CRF方法的三种变体；e图为广泛使用的一种CRF结构。 ### DeepLab总览： **DeepLab(v1和v2)** 论文1： Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs 于2014年12月22日提交到Arvix [https://arxiv.org/abs/1412.7062](http://link.zhihu.com/?target=https%3A//arxiv.org/abs/1412.7062) 论文2： DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs 于2016年6月2日提交到Arxiv [https://arxiv.org/abs/1606.00915](http://link.zhihu.com/?target=https%3A//arxiv.org/abs/1606.00915) 主要贡献： * 使用了空洞卷积； * 提出了在空间维度上实现金字塔型的空洞池化atrous spatial pyramid pooling(ASPP)； * 使用了全连接条件随机场。具体解释：空洞卷积在不增加参数数量的情况下增大了感受野，按照上文提到的空洞卷积论文的做法，可以改善分割网络。我们可以通过将原始图像的多个重新缩放版本传递到CNN网络的并行分支(即图像金字塔)中，或是可使用不同采样率(ASPP)的多个并行空洞卷积层，这两种方法均可实现多尺度处理。我们也可通过全连接条件随机场实现结构化预测，需将条件随机场的训练和微调单独作为一个后期处理步骤。