Setup Nvidia Clara Deploy For Medical Image Inference
In this post I will go through the process of setting up NVIDIA Clara Deploy on AWS
I am referring to Brad Genereaux’s blog post to create a NVIDIA Clara based system. I will also use the NVIDIA Clara official installation guide and various other posts to install and troubleshoot.
Brad’s blog - https://medium.com/@integratorbrad/how-i-built-a-space-to-train-and-infer-on-medical-imaging-ai-models-part-1-24ec784edb62
Disclaimer: The example shown here is NOT FOR CLINICAL USE and learning purpose only. The models are not FDA approved and not to be used for clinical decision making.
If you have an environment setup with ubuntu, docker, CUDA, NVIDIA container toolkit, NGC access then start from ‘Setting up inference environment (Clara Deploy)’ section.
AWS Environment setup
Create an AWS instance with following configuration (taken from NVIDIA official guide):
I will create a spot instance for my environment. P3.8xlarge is an expensive environment, by using a spot instance you can reduce the cost significantly, but it will create some interruption based on spot availability.
You might be ok with using some inexpensive GPU instances like g4dn, but for now I am going with NVIDIA suggested instance(p3)
After creating the spot instance, remote into the AWS server -
Check that you have a CUDA enabled GPU:
If nothing comes back from lspci command, then update the PCI hardware database of linux by entering `update-pciids` command and rerun the lspci | grep command. |
Check for the CUDA supported version of linux:
It is a 64-bit system!
Verify that gcc is installed
Find out the kernel version of the system
Before installing CUDA, the kernel header and development package of the same kernel version need to be installed.
Install CUDA by going to this link and selecting right choices:
https://developer.nvidia.com/cuda-downloads?target_os=Linux
Reboot the system after you are done with the above steps
Install Docker
Follow the steps outlined in https://docs.docker.com/engine/install/ubuntu/
Add docker’s official GPG Key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg –dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg |
Setup stable repository
echo \
“deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable” | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null |
Update apt package index
sudo apt-get update
Install docker community edition
sudo apt-get install docker-ce=5:19.03.8~3-0~ubuntu-bionic docker-ce-cli=5:19.03.8~3-0~ubuntu-bionic containerd.io
Verify that Docker is installed
sudo docker run hello-world
Add your user id in Docker user group
sudo usermod -aG docker $USER
Reboot
Sudo reboot now
Install NVIDIA container toolkit
Follow the steps outlined here -
https://github.com/NVIDIA/nvidia-docker
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
Setup the stable repository and the GPG key-
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ |
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list |
After updating the package list install the nvidia-docker2
sudo apt-get update && sudo apt-get install -y nvidia-docker2
Restart docker demon
sudo systemctl restart docker
Test by running a base CUDA container
sudo docker run –rm –gpus all nvidia/cuda:11.0-base nvidia-smi
Configuration of NGC access
Login to NGC (https://ngc.nvidia.com/) and generate API Key and execute the following
mkdir /etc/clara/ngc
cd /etc/clara/ngc
wget https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip && unzip ngccli_cat_linux.zip && rm ngccli_cat_linux.zip ngc.md5 && chmod u+x ngc
Add NGC key in ngc config
./ngc config set
Config docker to use NGC token
docker login nvcr.io
Now Ubuntu is loaded with Docker, NVIDIA docker, NVIDIA container toolkit
Setting up inference environment (Clara Deploy)
Disclaimer: The example shown here is NOT FOR CLINICAL USE and learning purpose only. The models are not FDA approved and not to be used for clinical decision making.
Clara Deploy installation guide is available in nvidia official site - https://docs.nvidia.com/clara/deploy/ClaraInstallation.html
Download the clara deploy SDK
Unzip the file
Now install the Clara deploy prerequisites. To do this run the bootstrap.sh. This will install kubernetes, helm etc. and their dependencies.
Navigate to the /etc/clara/bootstrap directory and run bootstrap.sh
Bootstrap.sh had old reference to helm repo
Had to change to get.helm.sh
Note: Helm 2 is now unsupported
Tiller pod installation issue: I faced problems successfully running tiller pod
Tiller image is being sourced from gcr.io
But the help images now moved to github container registry (ghcr.io)
As per the below deployment manifest, the image will be sourced from gcr.io
To make the image available, pulled the image manually from ghcr.io and retagged the image name with gcr.io, Along with that changed the imagePullPolicy to Never (from IfNotPresent).
After these steps rerun the bootstrap.sh and pre-requisite should install successfully.
Some helpful links to solve the image mismatch issue:
https://www.programmerall.com/article/273716355/
https://giters.com/helm/helm/issues/10011
https://programming.vip/docs/kunernets-uses-helm-to-install-tiller-trampling-pit.html
https://cynthiachuang.github.io/Install-NVIDIA-Clara-Deploy-SDK/
Install the Clara Deploy CLI
Information of Clara Deploy CLI - https://ngc.nvidia.com/catalog/resources/nvidia:clara:clara_cli
Run wget -
Move to /usr/bin
Unzip and chmod
Verify the Clara CLI is working
Adding the NGC API key to Clara CLI:
clara config –key <NGC_API_KEY> –orgteam nvidia/clara
Replace <NGC_API_KEY> with your NGC key.
Check the nvcr.io is connected thru docker:
Pull the clara deploy platform
When started to pull clara platform, error
My helm is v2, whereas “pull” is recognized in v3. Helm needs to be upgraded. Upgrading helm to 3.6.3, changing the helm_version and helm_checksum and re-executing bootstrap.sh
Pods are running and didn’t get recreated:
Tiller pod got created using kubernetes helm container image 2.15.2
There is a version mismatch between client side and server side helm. We will see if this is a problem as we progress.
For now, the `clara pull platform` works:
Clara service is up as shown by the `helm ls`. Let’s bring the other services up.
Let’s check the PODs
All running!
Running inference engine using local input file
Pull a chest xray pipeline
Keep the model in a common model directory:
Create a pipeline for inference using the pipeline yaml file and clara create:
It will give a pipeline ID.
The pipeline is running:
Feed a input pic in png format to the pipeline by creating a job using the pipeline_id from previous step:
Start the job manually with the job_id from previous step:
Create an output destination directory and Download the output files:
Two file got created,
CSV file showing the chances of diseases:
The second file shows the image:
We have used Clara deploy successfully to get an inference from an x-ray image!