论文Fully Convolutional Networks for Semantic Segmentation 是图像分割的里程碑论文。
论文原文地址:[https://people.eecs.berkeley.edu/~jonlong/long\_shelhamer\_fcn.pdf](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf)
FCN论文开源caffe代码:[https://github.com/shelhamer/fcn.berkeleyvision.org](https://github.com/shelhamer/fcn.berkeleyvision.org)
本教程的tensorflow实现的FCN16S的代码:[https://github.com/tangzhenjie/FCN16S](https://github.com/tangzhenjie/FCN16S)
## 前沿
FCN论文的内容我们这里就不介绍了,可以自行阅读论文原文或者是别人写的博客。总之我们往下看的前提假设是你已经了解了论文的内容。我们这一节的目的是手把手教你实现论文的FCN 16s的实验。由于论文中提供的代码是Caffe的代码。我们将用tensorflow来实现原论文的实验。
## FCN 16S 实验过程
* [第一部分 准备数据](#%E7%AC%AC%E4%B8%80%E8%8A%82)
* [第二部分 定义网络结构](#%E7%AC%AC%E4%BA%8C%E8%8A%82)
* [第三部分 定义损失函数](#%E7%AC%AC%E4%B8%89%E8%8A%82)
* [第四部分 优化算法](#%E7%AC%AC%E5%9B%9B%E8%8A%82)
* [第五部分 运行结果](#%E7%AC%AC%E4%BA%94%E8%8A%82)
<h3 id="第一节">第一部分:准备数据</h5>
我们使用由MIT提供的Scene Parsing Challenge dataset [http://sceneparsing.csail.mit.edu/](http://sceneparsing.csail.mit.edu/)
### **创建项目**
首先我们在github上创建一个项目名为**FCN16S**如下图:![](https://box.kancloud.cn/f25a541abda6946789b983aeda426a9d_1920x866.png)
然后打开pycharm把该项目克隆下来如下图:
![](https://box.kancloud.cn/a57bf0d0d5e1e33e760ab0a7c680c256_783x488.png)
![](https://box.kancloud.cn/8897b75877cef21ecb05401bde3b2363_783x488.png)
修改项目运行环境:
![](https://box.kancloud.cn/453530285fb5e07c893b39b0746bd15d_521x543.png)
![](https://box.kancloud.cn/7e38f98df4d892e004d3a21fd4c4c72e_1046x721.png)
### **到现在我们有了一个空项目并配置好了运行环境,下面我们一步一步书写项目代码**。
#### 首先我们创建项目主体文件名为:FCN16S.py 并加到版本控制里面。如下图:![](https://box.kancloud.cn/2299f6b55d3190dd1591c0492a525693_736x258.png)
可以输入下面代码测试tensorflow环境是够安装完成:
```
import tensorflow as tf
hello = tf.constant('hello,tensorf')
sess = tf.Session()
print(sess.run(hello))
#如果正常运行,输出 b'hello,tensorf' ,则TensorFlow安装成功。
```
下面我们创建准备数据的文件并加入版本控制:read\_MITSceneParsingData.py 如下图:
![](https://box.kancloud.cn/81af6551ebe17ff70470e4b1fd86ede8_386x269.png)
> 首先我们应该知道我们使用的数据集是Scene Parsing Challenge dataset,Training set:20,210 images Validation set:2,000 images
首先我们在read\_MITSceneParsingData.py中定义一个函数:
```
~~~
__author__ = 'tangzhenjie'
import os
# 数据集下载URL
DATA_URL = 'http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip'
"""
从.pickle文件读取训练集和验证机文件名数组
param:
data_dir: 文件存放的文件夹
return:
训练集和验证机文件名数组 (tuple)
"""
def read_dataset(data_dir):
pickle_filename = "MITSceneParsing.pickle"
pickle_filepath = os.path.join(data_dir, pickle_filename)
# 验证文件如果不存在就去下载
if not os.path.exists(pickle_filepath):
~~~
```
我们现在需要去下载文件,为了使代码可读性强,我们另新建一个文件来处理下载文件:TensorflowUtils.py
然后在TensorflowUtils.py中添加下面代码:
```
__author__ = 'tangzhenjie'
import os, sys
from six.moves import urllib
import tarfile
import zipfile
import scipy.io
import tensorflow as tf
import scipy.misc as misc
"""
下载对应url的文件
param:
dir_path: 下载和解压文件的位置
url_name: 要下载的文件的url
is_tarfile: 是不是tar文件
is_zipfile: 是不是zip文件
"""
def maybe_download_and_extract(dir_path, url_name, is_tarfile=False, is_zipfile=False):
#首先验证要下载到的解压到的文件夹是否是存在
if not os.path.exists(dir_path):
os.makedirs(dir_path)
# 判断有没有下载,没有再去下载
file_name = url_name.split('/')[-1]
file_path = os.path.join(dir_path, file_name)
if not os.path.exists(file_path):
# 定义一个下载过程中显示进度的函数
def _progress(count, block_size, total_size):
sys.stdout.write(
'\r>> Downloading %s %.1f%%' % (file_name, float(count * block_size) / float(total_size) * 100.0)
)
# 刷新输出
sys.stdout.flush()
file_path, _ = urllib.request.urlretrieve(url_name, file_path, reporthook=_progress)
# 获取文件信息
statinfo = os.stat(file_path)
print('Succesfully downloaded', file_name, statinfo.st_size, 'bytes.')
if is_tarfile:
tarfile.open(file_path, 'r:gz').extractall(dir_path)
if is_zipfile:
with zipfile.ZipFile(file_path) as zf:
zip_dir = zf.namelist()[0]
zf.extractall(dir_path)
```
然后在read\_MITSceneParsingData.py文件中调用该方法并测试:
目前read\_MITSceneParsingData.py内容为:
```
__author__ = 'tangzhenjie'
import os
import TensorflowUtils as utils
# 数据集下载URL
DATA_URL = 'http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip'
"""
从.pickle文件读取训练集和验证机文件名数组
param:
data_dir: 文件存放的文件夹
return:
训练集和验证机文件名数组 (tuple)
"""
def read_dataset(data_dir):
pickle_filename = "MITSceneParsing.pickle"
pickle_filepath = os.path.join(data_dir, pickle_filename)
# 验证文件如果不存在就去下载
if not os.path.exists(pickle_filepath):
utils.maybe_download_and_extract(data_dir, DATA_URL, is_zipfile=True)
read_dataset("\\")
```
显示如下表示代码没错:
![](https://box.kancloud.cn/b26f81f914dcd7c97194259e21f962f4_1345x328.png)
现在我们在read\_MITSceneParsingData.py文件中添加获取训练集和验证机文件名数组的代码如下:
```
~~~
__author__ = 'tangzhenjie'
import os
from tensorflow.python.platform import gfile
from six.moves import cPickle as pickle
import glob
import random
import TensorflowUtils as utils
# 数据集下载URL
DATA_URL = 'http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip'
"""
从.pickle文件读取训练集和验证机文件名数组
param:
data_dir: 文件存放的文件夹
return:
训练集和验证机文件名数组 (tuple)
"""
def read_dataset(data_dir):
pickle_filename = "MITSceneParsing.pickle"
pickle_filepath = os.path.join(data_dir, pickle_filename)
# 验证文件如果不存在就去下载
if not os.path.exists(pickle_filepath):
utils.maybe_download_and_extract(data_dir, DATA_URL, is_zipfile=True)
#下载并解压好文件后获取训练集合验证集文件名数组
SceneParsing_folder = os.path.splitext(DATA_URL.split("/")[-1])[0]
result = create_image_lists(os.path.join(data_dir, SceneParsing_folder))
print("序列化 ...")
with open(pickle_filepath, 'wb') as f:
pickle.dump(result, f, pickle.HIGHEST_PROTOCOL)
else:
print ("Found pickle file!")
with open(pickle_filepath, 'rb') as f:
result = pickle.load(f)
training_records = result['training']
validation_records = result['validation']
del result
return training_records, validation_records
def create_image_lists(image_dir):
if not gfile.Exists(image_dir):
print("Image directory '" + image_dir + "' not found.")
return None
directories = ['training', 'validation']
image_list = {}
for directory in directories:
file_list = []
image_list[directory] = []
file_glob = os.path.join(image_dir, "images", directory, '*.' + 'jpg')
file_list.extend(glob.glob(file_glob))
if not file_list:
print('No files found')
else:
for f in file_list:
filename = os.path.splitext(f.split("\\")[-1])[0]
annotation_file = os.path.join(image_dir, "annotations", directory, filename + '.png')
if os.path.exists(annotation_file):
record = {'image': f, 'annotation': annotation_file, 'filename': filename}
image_list[directory].append(record)
else:
print("Annotation file not found for %s - Skipping" % filename)
random.shuffle(image_list[directory])
no_of_images = len(image_list[directory])
print ('No. of %s files: %d' % (directory, no_of_images))
return image_list
# 我下载解压好的文件在D:\dataSet\MIT
test, val = read_dataset("D:\dataSet\MIT")
~~~
```
打断点调试运行结果如下:
1.第一次执行看看是否生成.MITSceneParsing.pickle文件
![](https://box.kancloud.cn/139d8f93cc98b04d145497b498d64ac7_791x276.png)
2.看看结果是你想要的吗
![](https://box.kancloud.cn/6df46078cabc94b273393d68142f6a9f_991x519.png)
删除下测试语句:
```
# 我下载解压好的文件在D:\dataSet\MIT
test, val = read_dataset("D:\dataSet\MIT")
end = 2
```
**到此我们已经获得了训练集和验证机文件名数组**
**下一步我们就准备输入到网络中的图像数据**:
新建一个文件:BatchDatsetReader.py输入以下代码:
```
~~~
"""
Code ideas from https://github.com/Newmu/dcgan and tensorflow mnist dataset reader
"""
import numpy as np
import scipy.misc as misc
# 测试代码
import read_MITSceneParsingData as Reader
# 测试代码
class BatchDatset:
files = [] # 存放图像文件路径
images = [] # 存放图像数据数组
annotations = [] # 存放标签图s像数据
image_options = {} # 改变图像的选择
batch_offset = 0 # 获取batch数据开始的偏移量
epochs_completed = 0 # 记录epoch的次数
# 构造函数
def __init__(self, record_list, image_options = {}):
print("Initializing Batch Dataset Reader...")
print(image_options)
self.files = record_list
self.image_options = image_options
self._read_images()
def _read_images(self):
self._channels = True
self.images = np.array([self._transform(filename['image']) for filename in self.files])
self._channels = False
self.annotations = np.array([np.expand_dims(self._transform(filename['annotation']), axis=3) for filename in self.files])
print(self.images.shape)
print(self.annotations.shape)
def _transform(self, filename):
# 读取图像数据到ndarray
image = misc.imread(filename)
# 保证图像通道数为3
if self._channels and len(image.shape) < 3:
image = np.array([image for i in range(3)])
if self.image_options.get("resize", False) and self.image_options["resize"]:
resize_size = int(self.image_options["resize_size"])
resize_image = misc.imresize(image, [resize_size, resize_size], interp='nearest')
else:
resize_image = image
return np.array(resize_image)
# 获取全部的图像和标记图像
def get_records(self):
return self.images, self.annotations
# 修改偏移量
def reset_batch_offset(self, offset=0):
self.batch_offset = offset
# 获取下一个batch
def next_batch(self, batch_size):
# 开始位置
start = self.batch_offset
# 下一个batch的开始位置(也是这次的结束位置)
self.batch_offset += batch_size
# 判断位置是否超出界限
if self.batch_offset > self.images.shape[0]:
# 超出界限证明完成一次epoch
self.epochs_completed += 1
print("****************** Epochs completed: " + str(self.epochs_completed) + "******************")
# 准备下一次数据
# 首先打乱数据
perm = np.arange(self.images.shape[0])
np.random.shuffle(perm)
self.images = self.images[perm]
self.annotations = self.annotations[perm]
# 开始下一次epoch
start = 0
self.batch_offset = batch_size
# 生成数据
end = self.batch_offset
return self.images[start:end], self.annotations[start:end]
# 获取一组随机的batch
def get_random_batch(self, batch_size):
indexs = np.random.randint(0, self.images.shape[0], size=batch_size).tolist()
return self.images[indexs], self.annotations[indexs]
# 测试代码
record_lists = Reader.read_dataset("D:\dataSet\MIT")
BatchDatsetObject = BatchDatset(record_lists[0][0:1000], {})
BatchData = BatchDatsetObject.next_batch(10)
i = 0
# 测试代码
~~~
```
测试结果如下(由于数据集大我们选择一部分来进行测试,首先我们应该知道这种数据读取的方式不好因为占用内存太大,后期我们将使用tensorflow自带的读取数据的方法来解决这个问题)记得删除测试代码:
![](https://box.kancloud.cn/071c243d3480d2868f9cd2541e5a3179_1841x911.png)
**好的到目前为止我们已经完成了数据准备的部分。**
<h3 id="第二节">第二部分:定义网络结构</h5>
###
这里有一个网络可视化的小工具可以清楚地看到网络的结构:[https://dgschwend.github.io/netscope/](https://dgschwend.github.io/netscope/)
可以先看看网络的具体结构
1. 首先打开网址:[https://dgschwend.github.io/netscope/](https://dgschwend.github.io/netscope/) 点击下面按钮
2. ![](https://box.kancloud.cn/a3cb8be385ad43fba9d5d6e1a55722df_1069x361.png)
3. ![](https://box.kancloud.cn/b54315b5899b65898539881877c032ff_730x210.png)
4. 输入文件:[https://github.com/tangzhenjie/FCN16S/blob/master/ppt/FCN16S.txt](https://github.com/tangzhenjie/FCN16S/blob/master/ppt/FCN16S.txt)
内容能看到官方的FCN16S结构图,我们就按照这个实现。
我们就来书写网络结构,回到我们开始创建的:FCN16S.py在其中补全代码:
我们先定义网络所需要的参数和需要导入的包:
```
from __future__ import print_function
import tensorflow as tf
import numpy as np
import TensorflowUtils as utils
import read_MITSceneParsingData as scene_parsing
import datetime
import BatchDatsetReader as dataset
from six.moves import xrange # 兼容python2和python3
# 定义一些网络需要的参数(可以以命令行可选参数进行重新赋值)
FLAGS = tf.flags.FLAGS
# batch大小
tf.flags.DEFINE_integer("batch_size", "2", "batch size for training")
# 定义日志文件位置
tf.flags.DEFINE_string("logs_dir", "D:\pycharm_program\FCN16S\Logs\\", "path to logs directory")
# 定义图像数据集存放的路径
tf.flags.DEFINE_string("data_dir", "D:\pycharm_program\FCN16S\Data_zoo\MIT_SceneParsing\\", "path to the dataset")
# 定义学习率
tf.flags.DEFINE_float("learning_rate", "1e-4", "learning rate for Adam Optimizer")
# 存放VGG16模型的mat (我们使用matlab训练好的VGG16参数)
tf.flags.DEFINE_string("model_dir", "D:\pycharm_program\FCN16S\Model_zoo\\", "Path to vgg model mat")
# 是否是调试状态(如果是调试状态会额外保存一些信息)
tf.flags.DEFINE_bool("debug", "False", "Model Debug:True/ False")
# 执行的状态(训练 测试 显示)
tf.flags.DEFINE_string("mode", "train", "Mode: train/ test/ visualize")
# checkpoint目录
tf.flags.DEFINE_string("checkpoint_dir", "D:\pycharm_program\FCN16S\Checkpoint\\", "path to the checkpoint")
# 验证结果保存图像目录
tf.flags.DEFINE_string("image_dir", "D:\pycharm_program\FCN16S\Image\\", "path to the checkpoint")
# 模型地址
MODEL_URL = "http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-16.mat"
```
我们下一步就是去首先看看下载下来的训练好的VGG16的权重结构。
第一步我们先把模型下载下来,所以在:TensorflowUtils.py中添加以下方法:
```
import scipy.io
"""
获取模型数据
:param dir_path 下载的位置
model_url 模型的网络位置
"""
def get_model_data(dir_path, model_url):
maybe_download_and_extract(dir_path, model_url)
# 判断是否下载下来
filename = model_url.split("/")[-1]
file_path = os.path.join(dir_path, filename)
if not os.path.exists(file_path):
raise IOError("VGG16 model not found")
data = scipy.io.loadmat(file_path)
return data
```
在FCN16S.py中书写测试代码如下:
```
# 测试代码
model_data = utils.get_model_data("D:\pycharm_program\FCN16S\VGG16MODEL", MODEL_URL)
# 测试代码
```
第一次运行结果如下:
![](https://box.kancloud.cn/5285fb086fd62a26efa3c2897415efc5_1087x378.png)
然后我们看看.mat中存储的数据样子:如下
![](https://box.kancloud.cn/f3d348c49c5837fd59efb6f2ae79beee_974x370.png)
我们只关心layers中的信息。所以我们先测试layers中有什么东西,在:FCN16S.py中继续添加测试代码如下:
> 参考的链接是:[https://zhuanlan.zhihu.com/p/40492866](https://zhuanlan.zhihu.com/p/40492866)
```
# 测试代码
model_data = utils.get_model_data("D:\pycharm_program\FCN16S\VGG16MODEL", MODEL_URL)
layers = model_data["layers"]
vgg_layers = model_data["layers"][0] # type 1*37 (37层)
for element in xrange(0, 37):
layer = vgg_layers[element]
struct = layer[0][0]
number = len(struct)
if number == 5:
# weights pad type name stride
print(struct[3])
if number == 2:
# relu层信息
print(struct[1])
if number == 6:
# pool层信息或者是最后一层信息
print(struct[0])
# 测试代码
```
运行结果如下(由于太长截不全请自行运行):
![](https://box.kancloud.cn/8aa3211def28fcc161fdbb9347b158ac_765x318.png)
> 结果解释:打印出了每一层的名字。
我们构建网络只需要其中的卷积层权重即可,所以我们要会获取W 和 B即可。 下面我们获得W和B继续添加下面测试代码:
```
# 第0层是卷积层,我们直接给出第0层w和b的位置
layer0 = vgg_layers[0]
# w
w_shape = layer0[0][0][0][0][0].shape
b_shape = layer0[0][0][0][0][1].shape
print(w_shape)
print(b_shape)
```
运行结果如下:
![](https://box.kancloud.cn/adc2e04a5dbf83598fa02c56692ebae5_421x146.png)
> 结果说明:我们从网络结构中可以看出第一层卷积核为3\*3 输入为3channel输出为64channel
**到此我们清楚了.mat文件中的东西和位置**。我们现在着手开始搭建网络。因为FCN16S网络前面的卷积层都没有动,所以我们先把前面的卷积层搭建起来。
继续回到FCN16S.py这个文件中。在编写网络之前我们先在:TensorflowUtils.py中添加几个功能函数。代码如下:
```
# 有权重初始值定义在网络中生成变量的函数
def get_variable(weights, name):
# 定义常数初始化器
init = tf.constant_initializer(weights, dtype=tf.float32)
# 生成变量
var = tf.get_variable(name=name, initializer=init, shape=weights.shape)
return var
# 有变量的shape生成平均值为0标准差为0.02的截断的正态分布数值的变量
def weight_variable(shape, stddev=0.02, name=None):
initial = tf.truncated_normal(shape, stddev=stddev)
if name is None:
return tf.Variable(initial)
else:
return tf.get_variable(name, initializer=initial)
# 生成b值的变量
def bias_variable(shape, name=None):
initial = tf.constant(0.0, shape=shape)
if name is None:
return tf.Variable(initial)
else:
return tf.get_variable(name, initializer=initial)
####################下面定义操作#########################
# 定义卷积输入和输出大小不变(通道可能变化)操作
def conv2d_basic(x, W, bias):
# stride 1 padding same保证卷积输入和输出相同
conv = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")
return tf.nn.bias_add(conv, bias)
# 定义卷积输出是输入的二分之一
def conv2d_strided(x, W, bias):
conv = tf.nn.conv2d(x, W, strides=[1, 2, 2, 1], padding="SAME")
return tf.nn.bias_add(conv, bias)
# 定义maxpool层使图像缩小一半
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2 , 1], strides=[1, 2, 2, 1], padding="SAME")
# 定义平均池化使图像缩小一半
def avg_pool_2x2(x):
return tf.nn.avg_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
######################图像处理方法#######################
def process_image(image, mean_pixel):
return image - mean_pixel
def unprocess_image(image, mean_pixel):
return image + mean_pixel
~~~
#######################padding操作####################
# 因为官方caffe代码说是先padding100
def pading(image, paddingdata):
if len(image.shape) == 3:
# tensor的shape为[height, width, channels]
target_height = image.shape[0] + paddingdata * 2
target_width = image.shape[1] + paddingdata * 2
return tf.image.pad_to_bounding_box(image,offset_height=paddingdata, offset_width=paddingdata, target_height=target_height,target_width=target_width)
elif len(image.shape) == 4:
# [batch, height, width, channels]
target_height = image.shape[1] + paddingdata * 2
target_width = image.shape[2] + paddingdata * 2
return tf.image.pad_to_bounding_box(image, offset_height=paddingdata, offset_width=paddingdata, target_height=target_height,target_width=target_width)
else:
raise ValueError("image tensor shape error")
# 保存图像
def save_image(image, save_dir, name, mean=None):
"""
Save image by unprocessing if mean given else just save
:param image:
:param save_dir:
:param name:
:param mean:
:return:
"""
if mean:
image = unprocess_image(image, mean)
misc.imsave(os.path.join(save_dir, name + ".png"), image)
```
**有了这些工具函数我们接着构建网络**
在FCN16S中添加下面代码补充完成vgg\_net函数:
```
def vgg_net(weights, image):
# 首先我们定义FCN16S中使用VGG16层中的名字,用来生成相同的网络
layers = (
"conv1_1", "relu1_1", "conv1_2", "relu1_2", "pool1",
"conv2_1", "relu2_1", "conv2_2", "relu2_2", "pool2",
"conv3_1", "relu3_1", "conv3_2", "relu3_2", "conv3_3", "relu3_3", "pool3",
"conv4_1", "relu4_1", "conv4_2", "relu4_2", "conv4_3", "relu4_3" "pool4",
"conv5_1", "relu5_1", "conv5_2", "relu5_2", "conv5_3", "relu5_3", "pool5"
)
# 生成的公有层的所有接口
net = {}
# 当前输入
current = image
for i, name in enumerate(layers):
# 获取前面层名字的前四个字符
kind = name[:4]
if kind == "conv":
kernels = weights[i][0][0][0][0][0]
bias = weights[i][0][0][0][0][1]
# matconvnet: weights are [width, height, in_channels, out_channels]
# tensorflow: weights are [height, width, in_channels, out_channels]
# 生成变量
kernels = utils.get_variable(np.transpose(kernels, (1, 0, 2, 3)), name=name + "_w")
bias = utils.get_variable(bias.reshape(-1), name=name + "_b")
current = utils.conv2d_basic(current, kernels, bias)
elif kind == "relu":
current = tf.nn.relu(current, name=name)
if FLAGS.debug:
utils.add_activation_summary(current)
elif kind == "pool":
current = utils.avg_pool_2x2(current)\
net[name] = current
return net
```
现在我们把VGG16的前5层结构写出来了,现在测试是否正确添加测试代码如下:
```
####################### 测试代码 ################################
# 构建图
model_data = utils.get_model_data("D:\pycharm_program\FCN16S\VGG16MODEL", MODEL_URL)
weights = model_data["layers"][0]
image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image")
net = vgg_net(weights,image)
# 获取数据
training_records, validation_records = scene_parsing.read_dataset("D:\dataSet\MIT")
datsetObject = dataset.BatchDatset(validation_records, {"resize":True, "resize_size": 224})
batchdataset = datsetObject.get_random_batch(2)
imagedata = batchdataset[0]
feed_dict = {image: imagedata}
# 运行图
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run(net["pool5"], feed_dict=feed_dict).shape)
########################## 测试代码 ###########################
```
结果:
![](https://box.kancloud.cn/37717f6b300c37b9679f1fe82a9289b5_1001x919.png)
> 结果解释:因为卷积层使图片大小不变而pool操作会使图片缩小一半。所以224\*224经过5个pool后变成了7\*7
**到此为止我们实现了FCN16S与VGG16相同的结构下面我们就去完整的构造FCN16S网络**
在FCN16.py中输入下面代码:
```
"""
构建FCN16S
:param image 网络输入的图像 [batch, height, width, channels]
:return 输出与image大小相同的tensor
"""
def fcn16s_net(image, keep_prob):
# 首先我们padding图片
image = utils.pading(image, 100)
# 转换数据类型
# 首先我们获取相同部分构造的模型权重
model_data = utils.get_model_data(FLAGS.model_dir, MODEL_URL)
weights = model_data["layers"][0]
with tf.variable_scope("VGG16"):
vgg16net_dict = vgg_net(weights, image)
with tf.variable_scope("FCN16S"):
pool5 = vgg16net_dict["pool5"]
# 创建fc6层
w6 = utils.weight_variable([7, 7, 512, 4096], name="w6")
b6 = utils.bias_variable([4096], name="b6")
conv6 = tf.nn.conv2d(pool5, w6, [1, 1, 1, 1], padding="VALID")
conv_bias6 = tf.nn.bias_add(conv6, b6)
relu6 = tf.nn.relu(conv_bias6, name="relu6")
if FLAGS.debug:
utils.add_activation_summary(relu6)
relu_dropout6 = tf.nn.dropout(relu6, keep_prob=keep_prob)
# 创建fc7层
w7 = utils.weight_variable([1, 1, 4096, 4096], name="w7")
b7 = utils.bias_variable([4096], name="b7")
conv7 = utils.conv2d_basic(relu_dropout6, w7, b7)
relu7 = tf.nn.relu(conv7, name="relu7")
if FLAGS.debug:
utils.add_activation_summary(relu7)
conv_dropout7 = tf.nn.dropout(relu7, keep_prob=keep_prob)
# 定义score_fr层
w8 = utils.weight_variable([1, 1, 4096, NUM_OF_CLASSES], name="w8")
b8 = utils.bias_variable([NUM_OF_CLASSES], name="b8")
score_fr = utils.conv2d_basic(conv_dropout7, w8, b8)
# 定义upscore2层
```
因为我们需要反卷积层所以我们先在:TensorflowUtils.py中添加下面功能函数来执行反卷积:
```
# 反卷积操作
def conv2d_transpose_strided(x, w, b, output_shape=None, stride=2):
if output_shape is None:
# 如果默认就让反卷积的输出图片大小扩大一倍,通道为卷积核上的输出通道
tmp_shape = x.get_shape().as_list()
tmp_shape[1] *= 2
tmp_shape[2] *= 2
x_shape = tf.shape(x)
output_shape = tf.stack([x_shape[0], tmp_shape[1], tmp_shape[2], w.get_shape().as_list()[2]])
conv = tf.nn.conv2d_transpose(x, w, output_shape, strides=[1, stride, stride, 1], padding="SAME")
return tf.nn.bias_add(conv, b)
```
> tensorflow反卷积操作的解释参考文档:[https://blog.csdn.net/mao\_xiao\_feng/article/details/71713358](https://blog.csdn.net/mao_xiao_feng/article/details/71713358)
我们在:TensorflowUtils.py文件中测试中添加测试代码测试卷积操作:
```
~~~
###########测试代码############
# 卷积操作
conv_image = tf.zeros([1, 12, 12, 3], dtype=tf.float32)
conv_kernel = tf.Variable(initial_value=tf.ones([2, 2, 3, 2], dtype=tf.float32))
out_image = tf.nn.conv2d(conv_image, conv_kernel, [1,2,2,1], padding="SAME")
#反卷积操作
transpose_kernel = tf.Variable(initial_value=tf.ones([2,2,3,2], dtype=tf.float32))
transpose_b = tf.Variable(initial_value=tf.zeros([3], dtype=tf.float32))
image = conv2d_transpose_strided(out_image, transpose_kernel, transpose_b)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run(image).shape)
###########测试代码############
~~~
```
正确结果如下:![](https://box.kancloud.cn/0ca8bed347b39be967c70906c66a9052_1423x822.png)
> 反卷积是卷积逆操作(传入的参数卷积核、stride、padding不变, 图片和偏执需要改变)
删除测试代码我们继续回到FCN16S.py构建我们的网络:
```
~~~
# 定义upscore2层
w9 = utils.weight_variable([4, 4, NUM_OF_CLASSES, NUM_OF_CLASSES], name="w9")
b9 = utils.bias_variable([NUM_OF_CLASSES], name="b9")
upscore2 = utils.conv2d_transpose_strided(score_fr, w9, b9)
# 定义score_pool4
pool4_shape = vgg16net_dict["pool4"].get_shape()
w10 = utils.weight_variable([1, 1, pool4_shape[3].value, NUM_OF_CLASSES], name="w10")
b10 = utils.bias_variable([NUM_OF_CLASSES], name="b10")
score_pool4 = utils.conv2d_basic(vgg16net_dict["pool4"], w10, b10)
# 定义score_pool4c
upscore2_shape = upscore2.get_shape()
upscore2_target_height = upscore2_shape[1].value
upscore2_target_width = upscore2_shape[2].value
score_pool4c = tf.image.crop_to_bounding_box(score_pool4, 5, 5, upscore2_target_height, upscore2_target_width)
# 定义fuse_pool4
fuse_pool4 = tf.add(upscore2, score_pool4c, name="fuse_pool4")
# 定义upscore16
fuse_pool4_shape = fuse_pool4.get_shape()
w11 = utils.weight_variable([32, 32, NUM_OF_CLASSES, NUM_OF_CLASSES], name="w11")
b11 = utils.bias_variable([NUM_OF_CLASSES], name="b11")
output_shape = tf.stack([tf.shape(fuse_pool4)[0], fuse_pool4_shape[1].value * 16, fuse_pool4_shape[2].value * 16, NUM_OF_CLASSES])
upscore16 = utils.conv2d_transpose_strided(fuse_pool4, w11, b11, output_shape=output_shape , stride=16)
# 定义score层
image_shape = image.get_shape()
score_target_height = image_shape[1].value - 200 # 因为输入网络的图片需要先padding100,所以减去200
score_target_width = image_shape[2].value - 200 # 因为输入网络的图片需要先padding100,所以减去200
score = tf.image.crop_to_bounding_box(upscore16, 27, 27, score_target_height, score_target_width)
annotation_pred = tf.argmax(score, dimension=3, name="prediction")
return tf.expand_dims(annotation_pred, dim=3), score
~~~
```
> 注意由于tensorflow中的反卷积和caffe中的有区别,这里我们中间反卷积时操作的输出可能与原网络有区别。不过应该不影响网络的最终性能,我们到最后就能看出来。
到此我们写完了fcn16s\_net函数。我们构建完了网络实现了:从一个图像到经过卷积、池化和上卷积、剪切生成与原图像一样的特征图。
我们先测试一下,在:FCN16S.py中添加如下代码:
```
####################### 测试代码 ################################
# 构建图
image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image")
predict, score = fcn16s_net(image, 0.5)
# 获取数据
training_records, validation_records = scene_parsing.read_dataset("D:\dataSet\MIT")
datsetObject = dataset.BatchDatset(validation_records, {"resize":True, "resize_size": 224})
batchdataset = datsetObject.get_random_batch(2)
imagedata = batchdataset[0]
feed_dict = {image: imagedata}
# 运行图
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run(score, feed_dict=feed_dict).shape)
########################## 测试代码 ###########################
```
> 注意记得修改model\_dir的值,否则你还得下载一次模型数据(模型数据有点大)
测试结果如下:
![](https://box.kancloud.cn/04511fb12c3de566aaa672ac6e4c8ef1_671x291.png)
**到此我们已经实现了定义网络结构的一部分。**
<h3 id="第三节">第三部分:定义损失函数</h5>
这一节我们就来实现训练该网络的一部分。我们先写main函数:
```
~~~
def main(argv=None):
#构建网络部分
# 我们首先定义网络的输入部分
keep_probability = tf.placeholder(tf.float32, name="keep_probability")
image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image")
annotation = tf.placeholder(tf.int32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 1], name="annotation")
pred_annotation, logits = fcn16s_net(image, keep_probability)
# 把我们需要观察的图片和生成的结果图保存下来
tf.summary.image("input_image", image, max_outputs=2)
tf.summary.image("ground_truth", tf.cast(annotation, tf.uint8), max_outputs=2)
tf.summary.image(pred_annotation, tf.cast(pred_annotation, tf.uint8), max_outputs=2)
# 定义损失函数
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.squeeze(annotation, squeeze_dims=[3])), name="entropy")
# 把损失保存下来
loss_summary = tf.summary.scalar("entropy", loss)
# 获取要训练的变量
trainable_var = tf.trainable_variables()
# 如果是调试运行下保存变量
if FLAGS.debug:
for var in trainable_var:
utils.add_to_regularization_and_summary(var)
~~~
```
<h3 id="第四节">第四部分:优化算法</h5>
有了损失函数我们现在就去使用优化算法来减少损失,我们在FCN16S.py文件中添加优化损失的函数:
```
~~~
def train(loss_val, var_list):
optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
grads = optimizer.compute_gradients(loss_val, var_list=var_list)
if FLAGS.debug:
for grad, var in grads:
utils.add_gradient_summary(grad, var)
return optimizer.apply_gradients(grads)
~~~
```
有了优化算法我们继续在main函数中构建网络:
> 参考链接学习tensorboard:[https://jhui.github.io/2017/03/12/TensorBoard-visualize-your-learning/](https://jhui.github.io/2017/03/12/TensorBoard-visualize-your-learning/)
```
# 如果是调试运行下保存变量
if FLAGS.debug:
for var in trainable_var:
utils.add_to_regularization_and_summary(var)
train_op = train(loss, trainable_var)
#创建把所有要保存的调试信息集中起来的操作(以备存入文件)
print("Setting up summary op....")
summary_op = tf.summary.merge_all()
#################到此我们网络构建完毕#################
#################下面我们构建数据##########
print("Setting up image reader...")
train_records, valid_records = scene_parsing.read_dataset(FLAGS.data_dir)
# 打印出来看看数据条数是否正确
print(len(train_records))
print(len(valid_records))
print("Setting up dataset reader...")
image_options = {'resize':True, 'resize_size':IMAGE_SIZE}
if FLAGS.mode == "train":
train_dataset_reader = dataset.BatchDatset(train_records, image_options)
validation_dataset_reader = dataset.BatchDatset(valid_records, image_options)
#################构建数据完成####################################
###################构建运行对话##################
sess = tf.Session()
print("Setting up Saver.....")
saver = tf.train.Saver()
# create two summary writers to show training loss and validation loss in the same graph
# need to create two folders 'train' and 'validation' inside FLAGS.logs_dir
train_writer = tf.summary.FileWriter(FLAGS.logs_dir + "/train", sess.graph)
validation_writer = tf.summary.FileWriter(FLAGS.logs_dir + "validation")
# 首先给变量初始化进行训练验证前的的准备
sess.run(tf.global_variables_initializer())
# 判断有没有checkpoint
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
print("Model restored .....")
# 开始训练或者验证
if FLAGS.mode == "train":
for itr in xrange(MAX_ITERATION):
# 先生成batch数据
train_images, train_annotation = train_dataset_reader.next_batch(FLAGS.batch_size)
feed_dict = {image: train_images, annotation: train_annotation, keep_probability:0.85}
# 运行
sess.run(train_op, feed_dict=feed_dict)
# 下面是保存一些能反映训练中的过程的一些信息
if itr % 10 == 0:
train_loss, summary_str = sess.run([loss, loss_summary], feed_dict=feed_dict)
print("Step: %d, Train_loss: %d" % (itr, train_loss))
train_writer.add_summary(summary_str, itr)
train_writer.flush()
if itr % 500 == 0:
valid_images, valid_annotations = validation_dataset_reader.next_batch(FLAGS.batch_size)
valid_loss, summary_sva = sess.run([loss, loss_summary], feed_dict={image: valid_images, annotation: valid_annotations,
keep_probability: 1.0})
print("%s------> Validation_loss: %g" % (datetime.datetime.now(), valid_loss))
saver.save(sess, FLAGS.checkpoint_dir + "model.ckpt", itr)
# add validation loss to TensorBoard
validation_writer.add_summary(summary_sva, itr)
validation_writer.flush()
elif FLAGS.mode == "visualize":
valid_images, valid_annotations = validation_dataset_reader.get_random_batch(FLAGS.batch_size)
pred = sess.run(pred_annotation, feed_dict={image: valid_images, annotation: valid_annotations,
keep_probability: 1.0})
valid_annotations = np.squeeze(valid_annotations, axis=3)
pred = np.squeeze(pred, axis=3)
# 保存结果
for itr in range(FLAGS.batch_size):
utils.save_image(valid_images[itr].astype(np.uint8), FLAGS.image_dir, name="inp_" + str(5+itr))
utils.save_image(valid_annotations[itr].astype(np.uint8), FLAGS.image_dir, name="gt_" + str(5+itr))
utils.save_image(pred[itr].astype(np.uint8), FLAGS.image_dir, name="pred_" + str(5+itr))
print("Saved image: %d" % itr)
~~~
```
到此我们main函数就写完了。下面我们就可以运行该网络了,添加运行代码:
```
~~~
if __name__ == "__main__":
tf.app.run()
~~~
```
下面就是见证奇迹的时刻了。运行:FCN16S.py结果如下图所示:
![](https://box.kancloud.cn/87d7a09b0f2c4424033aa20c75d0bd3f_965x712.png)
> 注意:至此我们就完全实现了FCN16S网络。注意上面代码运行的时候会特别吃内存,因为该代码会先把全部的数据集读入内存。后期我们会换成tensorflow中的读取方式来解决此问题
<h3 id="第五节">第五部分:运行结果测试</h5>
我们在代码里加上计算m\_iou的节点然后测试:
```
~~~
# 计算m_iou
re_shape = tf.stack([tf.shape(pred_annotation)[0], IMAGE_SIZE * IMAGE_SIZE, 1])
annotation_new = tf.reshape(annotation, re_shape)
pred_annotation_new = tf.reshape(pred_annotation, re_shape)
mean_iou, endarray = tf.metrics.mean_iou(annotation_new, pred_annotation_new, NUM_OF_CLASSES)
~~~
```
然后在训练的代码中添加如下代码:
```
~~~
sess.run(tf.local_variables_initializer())
~~~
~~~
# miou
m_iou, array_end = sess.run([mean_iou, endarray], feed_dict={image: train_images, annotation: train_annotation, keep_probability:1.0})
print(m_iou)
print(array_end)
~~~
```
然后运行结果不好。我们下一节修改读入方法,和调试该网路与论文结果一直。
最后还是把到目前为止实现的代码位置分享给大家:[https://github.com/tangzhenjie/FCN16S](https://github.com/tangzhenjie/FCN16S)
- 序言
- 第一章 机器学习概述
- 第二章 机器学习环境搭建
- 环境搭建
- 第三章 机器学习之基础算法
- 第一节:基础知识
- 第二节:k近邻算法
- 第三节:决策树算法
- 第四节:朴素贝叶斯
- 第五节:逻辑斯蒂回归
- 第六节:支持向量机
- 第四章 机器学习之深度学习算法
- 第一节: CNN
- 4.1.1 CNN介绍
- 4.1.2 CNN反向传播
- 4.1.3 DNN实例
- 4.1.4 CNN实例
- 第五章 机器学习论文与实践
- 第一节: 语义分割
- 5.1 FCN
- 5.1.1 FCN--------实现FCN16S
- 5.1.2 FCN--------优化FCN16S
- 5.2 DeepLab
- 5.2.1 DeepLabv2
- 第六章 机器学习在实际项目中的应用