In this post I will go through the process of setting up NVIDIA Clara Deploy on AWS

I am referring to Brad Genereaux’s blog post to create a NVIDIA Clara based system. I will also use the NVIDIA Clara official installation guide and various other posts to install and troubleshoot.

Brad’s blog - https://medium.com/@integratorbrad/how-i-built-a-space-to-train-and-infer-on-medical-imaging-ai-models-part-1-24ec784edb62

Disclaimer: The example shown here is NOT FOR CLINICAL USE and learning purpose only. The models are not FDA approved and not to be used for clinical decision making.

If you have an environment setup with ubuntu, docker, CUDA, NVIDIA container toolkit, NGC access then start from ‘Setting up inference environment (Clara Deploy)’ section.

AWS Environment setup

Create an AWS instance with following configuration (taken from NVIDIA official guide):

Ref: https://docs.nvidia.com/clara/deploy/ClaraInstallation.html#installation-on-a-cloud-service-provider-csp

I will create a spot instance for my environment. P3.8xlarge is an expensive environment, by using a spot instance you can reduce the cost significantly, but it will create some interruption based on spot availability.

You might be ok with using some inexpensive GPU instances like g4dn, but for now I am going with NVIDIA suggested instance(p3)

After creating the spot instance, remote into the AWS server -

Check that you have a CUDA enabled GPU:

If nothing comes back from lspci command, then update the PCI hardware database of linux by entering `update-pciids` command and rerun the lspci grep command.

Check for the CUDA supported version of linux:

It is a 64-bit system!

Verify that gcc is installed

Find out the kernel version of the system

Before installing CUDA, the kernel header and development package of the same kernel version need to be installed.

Install CUDA by going to this link and selecting right choices:

https://developer.nvidia.com/cuda-downloads?target_os=Linux

Reboot the system after you are done with the above steps

Install Docker

Follow the steps outlined in https://docs.docker.com/engine/install/ubuntu/

Add docker’s official GPG Key

curl -fsSL https://download.docker.com/linux/ubuntu/gpg sudo gpg –dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Setup stable repository

echo \

“deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \

$(lsb_release -cs) stable” sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Update apt package index

sudo apt-get update

Install docker community edition

sudo apt-get install docker-ce=5:19.03.8~3-0~ubuntu-bionic docker-ce-cli=5:19.03.8~3-0~ubuntu-bionic containerd.io

Verify that Docker is installed

sudo docker run hello-world

Add your user id in Docker user group

sudo usermod -aG docker $USER

Reboot

Sudo reboot now

Install NVIDIA container toolkit

Follow the steps outlined here -

https://github.com/NVIDIA/nvidia-docker

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

Setup the stable repository and the GPG key-

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \

&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list sudo tee /etc/apt/sources.list.d/nvidia-docker.list

After updating the package list install the nvidia-docker2

sudo apt-get update && sudo apt-get install -y nvidia-docker2

Restart docker demon

sudo systemctl restart docker

Test by running a base CUDA container

sudo docker run –rm –gpus all nvidia/cuda:11.0-base nvidia-smi

Configuration of NGC access

Login to NGC (https://ngc.nvidia.com/) and generate API Key and execute the following

mkdir /etc/clara/ngc

cd /etc/clara/ngc

wget https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip && unzip ngccli_cat_linux.zip && rm ngccli_cat_linux.zip ngc.md5 && chmod u+x ngc

Add NGC key in ngc config

./ngc config set

Config docker to use NGC token

docker login nvcr.io

Now Ubuntu is loaded with Docker, NVIDIA docker, NVIDIA container toolkit

Setting up inference environment (Clara Deploy)

Disclaimer: The example shown here is NOT FOR CLINICAL USE and learning purpose only. The models are not FDA approved and not to be used for clinical decision making.

Clara Deploy installation guide is available in nvidia official site - https://docs.nvidia.com/clara/deploy/ClaraInstallation.html

Download the clara deploy SDK

Unzip the file

Now install the Clara deploy prerequisites. To do this run the bootstrap.sh. This will install kubernetes, helm etc. and their dependencies.

Navigate to the /etc/clara/bootstrap directory and run bootstrap.sh

Bootstrap.sh had old reference to helm repo

Had to change to get.helm.sh

Note: Helm 2 is now unsupported

Tiller pod installation issue: I faced problems successfully running tiller pod

Tiller image is being sourced from gcr.io

But the help images now moved to github container registry (ghcr.io)

As per the below deployment manifest, the image will be sourced from gcr.io

To make the image available, pulled the image manually from ghcr.io and retagged the image name with gcr.io, Along with that changed the imagePullPolicy to Never (from IfNotPresent).

After these steps rerun the bootstrap.sh and pre-requisite should install successfully.

Some helpful links to solve the image mismatch issue:

https://www.programmerall.com/article/273716355/

https://giters.com/helm/helm/issues/10011

https://programming.vip/docs/kunernets-uses-helm-to-install-tiller-trampling-pit.html

https://cynthiachuang.github.io/Install-NVIDIA-Clara-Deploy-SDK/

Install the Clara Deploy CLI

Information of Clara Deploy CLI - https://ngc.nvidia.com/catalog/resources/nvidia:clara:clara_cli

Run wget -

Move to /usr/bin

Unzip and chmod

Verify the Clara CLI is working

Adding the NGC API key to Clara CLI:

clara config –key <NGC_API_KEY> –orgteam nvidia/clara

Replace <NGC_API_KEY> with your NGC key.

Check the nvcr.io is connected thru docker:

Pull the clara deploy platform

When started to pull clara platform, error

My helm is v2, whereas “pull” is recognized in v3. Helm needs to be upgraded. Upgrading helm to 3.6.3, changing the helm_version and helm_checksum and re-executing bootstrap.sh

Pods are running and didn’t get recreated:

Tiller pod got created using kubernetes helm container image 2.15.2

There is a version mismatch between client side and server side helm. We will see if this is a problem as we progress.

For now, the `clara pull platform` works:

Clara service is up as shown by the `helm ls`. Let’s bring the other services up.

Let’s check the PODs

All running!

Running inference engine using local input file

Pull a chest xray pipeline

Keep the model in a common model directory:

Create a pipeline for inference using the pipeline yaml file and clara create:

It will give a pipeline ID.

The pipeline is running:

Feed a input pic in png format to the pipeline by creating a job using the pipeline_id from previous step:

Start the job manually with the job_id from previous step:

Create an output destination directory and Download the output files:

Two file got created,

CSV file showing the chances of diseases:

The second file shows the image:

We have used Clara deploy successfully to get an inference from an x-ray image!