In part_6 we added dropblock support to our model. In this tutorial we will build a simple resnet upon it and see the results.
ResNets
There are tons of sources that explain the concept of ResNets and skip connections very well. But very briefly, the idea is that when we build a deeper neural net we expect the accuracy to keep increasing, but normally this doesn’t happen and in fact it worsens. The reason for this is that when you go deeper problems like vanishing gradients really start to affect your model’s performance. So to solve this problem, skip connections were added to the network to make it easy to ignore layers that will hurt the performance. So now adding more layers will at least not hurt the performance and in better cases make it better. The following is an image from Andrew Ng’s lecture on ResNets:
So all that we should do is add this skip connections by summing the activations of layer L with the logits of layer L+n(where n can be the skip length) and feeding this sum to the activation function to get activations of layer L+n.
Implementation
First let’s see the parameters of our model:
batch_size = 250 # the number of training samples in a single iteration
test_batch_size = 250 # used to calculate test predictions over many iterations to avoid memory issues
drop_block_size = 5 # dropblock size
patch_size_1 = 3 # convolution filter size 1
patch_size_2 = 3 # convolution filter size 2
patch_size_3 = 3 # convolution filter size 2
patch_size_4 = 3 # convolution filter size 2
depth1 = 128 # number of filters in conv block
depth2 = 256 # number of filters in conv block
depth3 = 512 # number of filters in conv block
depth4 = 1024 # number of filters in conv block
num_hidden1 = 4096 # the size of the hidden neurons in fully connected layer
num_hidden2 = 4096 # the size of the hidden neurons in fully connected layer
num_hidden3 = 4096 # the size of the hidden neurons in fully connected layer
Then we will be updating our convolution block function to use two convolution operations instead of one:
# method that runs one convolution block with batch normalization
def run_conv_block(x, layer_name, filter_size, input_depth, output_depth):
with tf.variable_scope(layer_name):
conv = run_batch_norm(x)
conv = tf.nn.relu(conv)
conv = tf.nn.conv2d(conv,
get_conv_weight("weights1", [filter_size, filter_size, input_depth, output_depth]),
[1, 1, 1, 1], padding='SAME')
conv = run_batch_norm(conv)
conv = tf.nn.relu(conv)
conv = tf.nn.conv2d(conv,
get_conv_weight("weights2", [filter_size, filter_size, output_depth, output_depth]),
[1, 1, 1, 1], padding='SAME')
return conv
Then we will add another block called a residual block:
# method that runs one residual block with dropblock and maxbool
def run_residual_block(x, layer_name, patch_size, input_depth, output_depth):
with tf.variable_scope(layer_name):
hidden1 = run_conv_block(x, "conv_block_1", patch_size, input_depth, output_depth)
hidden1 = tf.nn.relu(hidden1)
hidden2 = run_conv_block(hidden1, "conv_block_2", patch_size, output_depth, output_depth)
hidden = tf.nn.relu(hidden1 + hidden2)
drop_block = DropBlock(keep_prob=conv_keep_prob, block_size=drop_block_size)
hidden = drop_block(hidden, training=is_training_ph)
hidden = tf.nn.max_pool(value=hidden, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
return hidden
In the above function, all we do is surround the convolution blocks with a new variable scope and run two convolution blocks and sum their values like we descibed above. Then run dropblock and max bool.
Now let’s have a look at the model function now:
# Model.
def model(data):
hidden = normalize_inputs(data)
hidden = run_residual_block(hidden, "residual_block_1", patch_size_1, num_channels, depth1)
hidden = run_residual_block(hidden, "residual_block_2", patch_size_2, depth1, depth2)
hidden = run_residual_block(hidden, "residual_block_3", patch_size_3, depth2, depth3)
hidden = run_residual_block(hidden, "residual_block_4", patch_size_4, depth3, depth4)
# flatten
hidden = tf.contrib.layers.flatten(hidden)
# fully connected layers
hidden = run_hidden_layer(hidden, "fully_connected_1", hidden.shape[1], num_hidden1, fully_connected_keep_prob,
use_activation=True)
hidden = run_hidden_layer(hidden, "fully_connected_2", num_hidden2, num_hidden3, fully_connected_keep_prob,
use_activation=True)
hidden = run_hidden_layer(hidden, "fully_connected_3", num_hidden3, images_labels, 1, use_activation=False)
return hidden
We will be using 4 residual blocks followed by our three hidden layers.
Results
Training set accuracy:
Test set accuracy:
Final training set accuracy: 99.5%
Final training set loss: 0.0300
Final test set accuracy: 70.7%
Final test set loss: 1.2815
So we increased our accuracy with a lower number of training epochs.
If you want to check the full state of the project until now click here to go the repository.
See you in part 8.