Training with NVIDIA Clara Train

In this section, you will fine tune a pretrained model with your own data. This model then can be used for inference. Disclaimer - This model is not for clinical use and models are not fit for clinical decision making.

For setting up Clara Train SDK refer to - https://blog.uplandr.com/2021/08/29/Setup-NVIDIA-Clara-Train-Medical-Image.html

I am referring to - https://medium.com/@integratorbrad/how-i-built-a-space-to-train-and-infer-on-medical-imaging-ai-models-part-9-26bbaae9ca2f

And few other sources.

Prepare the training data

For this exercise we will be using training images from https://github.com/ieee8023/covid-chestxray-dataset

Download the repository to ~/Downloads folder

Unzip the file in the experiments folder

The contents of the folder

Install few libraries for python coding to cleanup the input images:

Create a folder for storing training images

Installed sublime text 3 editor for source code creation.

https://linuxize.com/post/how-to-install-sublime-text-3-on-ubuntu-18-04/

Start sublime

Create convert.py python file for processing the images

Run the script to process the files

Next up, create a json file with image and finding.

Taking the metadata.csv file and cleaning it up to create a datalist.json file.

In this process removed some training data with biases or with unique findings with very few data points.

Used xl to open the csv and sort/filter/remove etc as required.

Use https://jsonlint.com/ to check the json for correctness.

Replace all the reference of .jpeg and .jpg to .png in the datalist.json file.

Take one or two of the images out of the json for running an inference on it later

Training the model

Training needs GPU resources and stopping clara deploy will help.

Get inside Clara Train container. *Correction* use clara-train-sdk:v3:0 in place of v4. The model I am going to download works with V3 (Tensorflow version of clara). From V4 Clara moved to pytorch.

Download the chest xray model (MMAR)

As you can the /clara/experiments folders are mapping inside the Clara Train container

Lets clone the MMAR to host

Some cleanup

Run the following in a separate terminal to give permission. Close the terminal after done.

Change some of the training script in the train_finetune.sh

Changing 3 data points 1) json source 2) epoch count(to 500) and 3) Learning Rate (to 0.00002)

Change training configuration

There are 6 things we are going to change-

After:

1) Changed epoch to 500, 2) learning rate to 2e-5, 3)Update the “subtrahend” and “divisor” parameters from the CenterData transform, in both the “train” and “validate” sections, 4) & 5) Change image_pipeline name to ClassificationKerasImagePipeline and added “sampling” : “automatic” in training (not in validation), 6) computeAUC aligned with 6 categories

Make #3 and #6 changes in the validation configuration file

Make following changes to environment configuration. Changes are 1) data_root and 2) dataset_json

Lets begin fine-tuning the model!

Start by making the script executable

*correction*

Datalist.json file should be in this location - /workspace/data/covid-training-set/training-images/

Else will get this error!

Few other errors may arise; fix these with data cleanup or delete.

After this, training will start

Now the training is complete.

As you can see the best metrics was at epoch 570 with validation metric of 0.83

Get inside docker container if you came out

See tensorboard output

Export the model

Exit from the docker container and check the model we trained

Models are listed

So we have fine-tuned a model that is trained on the data we created! awesome.