This is a walkthrough on how to use ultimate_image_classification repo from start to finish. This repo is a general image classification module made using Tensorflow 2 and Keras that contains multiple pre-trained models you can use on any data.
Problem
The problem we will try to tackle is scene recognition on a small dataset (around 450 images). There are 3 classes in the dataset which are (pool, operating room, gym).
Walkthrough
- Download or clone the repo
- Install the requirements using pip in your python3.6+ environment. (Note you can switch to cpu version of tensorflow if you don’t have a gpu)
- Download the dataset from here
- Extract the images inside ‘data’ folder in the repo to have ‘data/images’
- To use the repo we need to have a csv to describe te data in this format:
But since the data is already formatted to have a folder for each class with its images inside, we run make_csv_from_folders.py like this:
The script takes two parameters, the path to the folder, and the write path for the csv.
If it ran successfully, you will have a csv called ‘all_data’ inside data folder.
- Now we have a csv to describe the data, but we need to split it into train and test sets. We will do that by running split_train_test.py
The script takes three parameters: the path to the csv, the test set split fraction [0,1] (it will automatically calculate the same percentage on all classes), and an option to shuffle the data. The script will generate training_set.csv and testing_set.csv next to all_data.csv.
- Now everything is ready to actually start training. We will open ‘configs.py’ in which we can control all aspects of our training:
This is the most important file in the repo and contains varied options to control. Most importantly, we need to link the training and testing csvs as seen in this part:
You can definitely try to tweak other parameters like changing the base model, the batch size, or learning rate. If you have memory problems try reducing the batch size or choose a smaller model.
- Now we can start the training:
The training will automatically save the best checkpoint to the folder specified in configs. In our case:
So after training for some time, we will find a model in ‘saved_model’ folder. You will also find the tensorboard logs, the configs you used , and a training log csv that keeps track of loss and accuracies over the training.
- Now that we have a model we can actually test it to get the full metrics and evaluation. First we need to specify a load model path in configs:
Note that if we run train.py again now it will continue training from this checkpoint, but to test we will run the test script:
The test script will provide you with many metrics to judge the system like:
- Now we can also draw our activation maps by using Grad-Cam. We do that by running draw_activations script:
The script loads the model specified in ‘load_model_path’ in configs and saves the results next to the model , in ‘save_model_path’ in configs, inside a folder called cam_output. The results will be similar to these:
- Although there is a full tensorboard support during training, you might want to draw a simple figure of the training and test accuracies through the epochs. To do that you can run plot_training_log script:
It takes the path to the csv, if not specified it will look for it inside ‘save_model_path’ in configs. The results will be like:
- After being satisfied with a model, we can further compress it to a smaller size since the normal saving of the model saves information related to the training operation. We can compress the model by calling compress_weights script:
If model path is not provided it will automatically take the path specified in ‘load_model_path’ in configs. The script will then output two files ‘best_model_compressed.h5’ and ‘best_model_compressed.json’ in the same directory of the model. This new compressed version can be loaded using ‘custom_load_model’ method in utils.py:
The compressed model will be faster to load in inference mode.
Multi-label classification
The repo also supports multi-label classification, but of course there are some differences like:
-
The data csv should have the labels separated by ‘$’ if an image has more than one class associated with it. For example: class1$class2. In this case you should provide the csv for the training and test sets.
-
In multi-label classification mode, the loss function is automatically switched to binary cross entropy, instead of categorical cross entropy, and also the final activations will be set to sigmoid instead of softmax.
-
The metrics for measuring the performance are different in multi-label classification, mainly we use AUC-ROC to determine how well the model is performing and save the best model accordingly. So in the ‘save_model_path’ you will have extra files like ‘.training_stats.json’ and ‘best_auroc.log’ to describe the training so far. And of course running the test script will provide a different output. There is also an exact match accuracy, the repo automatically searches for the best threshold, and output it, from the range specified in the configs in ‘multilabel_threshold_range’ field.
-
When running the draw_activations script in multi-label you will only get activations for the highest class predicted, so you might want to tweak the strategy of which classes (top k for instances, or with a threshold) to draw the activations according to.
-
You can run the same example as a multi-label classification problem by setting this flag in configs:
When running the test script you will get results similar to these:
Data augmentations
By default we use the following augmentations:
- Rotations
- Scaling
- Flipping
- Color augmentations
- Contrast augmentations
- Shear
- Translation
In case you want to add or remove some augmentations, you can tweak ‘augmenter.py’. We use imgaug library for the augmentations, so you can visit this page for more possible augmentations.
Thank you
If you have any questions, please leave a comment here or add an issue to the repo.