Fine tuning models

For classification problems, using pretrained models like resnet, I read that some people only finetune the classification layers for their specific task. I guess is there harm training the entire pretrained model instead with your dataset? Could I train the last layers and then train the whole model?

Hi @tdougherty84:

Generally people do not fine tune the last layers (i.e. the one that gives the output which is usually dense). What I usually do is use hyperparameter tuning only for the middle/input layers. What you can do is unfreeze some of the ResNet layers to make them trainable with the following code.

from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras import models, regularizers
from tensorflow.keras import layers
img_size = 150

conv_base = ResNet50(weights='imagenet',
                  include_top=False,
                  input_shape=(img_size, img_size, 3))
model = models.Sequential()
model.add(conv_base) 
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(layers.Dropout(0.35))
model.add(layers.Dense(10, activation='softmax')) # last dense layer is without hyperparameter tuning

#add this code
conv_base.trainable = True
set_trainable = False
for layer in conv_base.layers:
    if layer.name == 'conv4_block1_1_conv': # modify this part in brackets to unfreeze different number of layers
        set_trainable = True 
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False
model.summary()

You can also choose to tweak/experiment other hyperparameters (i.e. the image input size img_size and the type of regularizer and adjusting the learning rate when you fit the model–Adam, SGD, RMSprop, Adagrad or even the type of Pretrained Architecture–VGG16, ImageNet etc.) to improve your metric (which in my case was accuracy). Here is an article which you may find helpful.

This defeats the whole purpose of transfer learning, which is to make use of the feature extractions to help improve the model. Unfreezing all layers/making the entire pretrained model’s parameters trainable means that you are just allowing the nodes in the pretrained architecture to be retrained as the weights will be updated by backpropagation so in a sense, you are just adding additional layers to the initial model, instead of leveraging on parameters already trained for you. Thus, I suggest unfreezing only a few layers as mentioned above.

I have done an end-to-end image classification deep learning project and here is my github repo if you wanna check it out.

Hope this helps!

3 Likes

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

2 Likes

Read the blog and it was very useful, thanks!