Introduction
In k8s we have two probes that are used to monitor the health and availability of a container, readiness and liveness probes. This post shows some examples on how to use them. The post will hopefully interest you in trying to take advantage of those probes when running NSO in k8s.
The more you can check in your liveness probe, the more certain you can be that NSO is actually up and running and does what it should.
NOTE, the actions in the example are toy examples, but hopefully they'll set you off with some ideas on how you can create readiness and most importantly a good liveness probe for your system. The liveness probe could in theory do a number of things, including touching devices.
Readiness
The readiness probe determines when a container is ready for service. NSO can sometimes take some time before it's ready for service.
See the k8s-liveness-readiness/is-ready.sh script. It first checks if NSO is started at all, if that succeeds, the script calls a simple action (/k8s/ready).
Please see k8s-liveness-readiness/lr-test/python/lr_test/main.py
Liveness
After the container has been running for a while, we need to make sure it's still alive. This is also done using an action (/k8s/alive). The example action simply creates a config transaction and writes a leaf (/k8s/last-live-check), this is to test that we can successfully commit a configuration transaction.
To run the examples, please see instructions below.
Base Image
You'll need to build the NSO system install base image. This is a somewhat minimized image that installs NSO.
Please note that you need to supply your own NSO installer binary and modify the Makefile for a correct container name. See the CONTAINER and VER variables in nso-system-install-base/Makefile.
cd nso-system-install-base
make image push
Once you have managed to build the base image, you can move on to the example.
Example Deployment
cd k8s-liveness-readiness
make image push deploy
This will launch a deployment in k8s. If you look at the pods, you'll see that the NSO container is not yet ready:
kubectl get pods
NAME READY STATUS RESTARTS AGE
nso-deployment-7d6bb7c698-zb77f 0/1 Running 0 8s
It will be stuck in this state until we tell NSO that it's ready. Please note, this is just for a demo purposes.
kubectl exec -it $(kubectl get pod | grep nso-deployment | awk '{print $1}') bash
root@nso-deployment-7d6bb7c698-zb77f:/# ncs_cli -u admin
admin@ncs> request k8s set-ready
admin@ncs> exit
root@nso-deployment-7d6bb7c698-zb77f:/# exit
Now if we look again at our pods, you'll see that the pod is ready
kubectl get pods
NAME READY STATUS RESTARTS AGE
nso-deployment-7d6bb7c698-mlqn4 1/1 Running 0 59s
Please note that the number of restarts is zero. To simulate a failure, we'll go back into the NSO CLI.
kubectl exec -it $(kubectl get pod | grep nso-deployment | awk '{print $1}') bash
root@nso-deployment-7d6bb7c698-mlqn4:/# ncs_cli -u admin
admin@ncs> request k8s set-dead
admin@ncs> exit
root@nso-deployment-7d6bb7c698-mlqn4:/# exit
Again, look at our pods
kubectl get pods
NAME READY STATUS RESTARTS AGE
nso-deployment-7d6bb7c698-mlqn4 0/1 Running 1 3m3s
You'll see that the pod has been restarted. This means that k8s health check worked as expected.
The code is hopefully self explanatory:
k8s-liveness-readiness/deployment.yml
k8s-liveness-readiness/is-alive.sh
k8s-liveness-readiness/is-ready.sh
k8s-liveness-readiness/lr-test/python/lr_test/main.py