As some of you may know we released two Cisco Validated Designs for Docker Datacenter early in the spring. There are links to them here. And while Docker Datacenter certainly has its appeal, our own inclinations were that we should focus our attention on Kubernetes. Starting in November of 2016 through February 2017 few of my colleagues and I did about 6 Kubernetes classes around the US (East coast, west coast, etc) and focused the attention of those on running on top of OpenStack. However, it became clear in those classes that the majority of our attendees who wanted to run Kubernetes in their own data center didn't want to run it on OpenStack nor VMware. They wanted to run it on bare metal. And since it was a Cisco event, they were looking to run it on UCS.
Advantages of Kubernetes on Bare Metal
I have found that running Kubernetes in production in the cloud for the last 6 months for the Cisco Pipeline application has stood up wonderfully. That solution runs on Metacloud and has been rock solid. However, as I thought about running it in a public cloud like most people do I thought of the several advantages bare metal has:
Cost - The Kubernetes cluster is always on. Even if it is utilized very weakly it is still on and you pay a price for that. I don't see people adding nodes and taking away nodes during non-peak period but this is certainly something that could be automated. Running in house has a much larger capital cost up front but saves money in the long run - especially as the system gets bigger.
Performance - You can specify all of your hardware needs and architect it all yourself. You can put powerful servers and design the network to meet optimal standards.
Control - You get to decide more options as to how the system will work. You want to run it on a bunch of Raspberry Pi nodes, you can do that too. Want your data stored locally? You got it. It's all yours.
Disadvantages of Kubernetes on Bare Metal
While the performance, and control gains are certainly a given, what is not a given is the cost savings. It may actually cost you more to run Kubernetes on prem. Why? Man hours. You see, there is a great amount of uncertainty when running Kubernetes on-Prem. Most of the guides out there for Kubernetes are geared toward AWS or Google Cloud. So often its difficult to know a good architecture for running Kubernetes on Bare metal. But this is not all. Consider some other factors:
Load Balancers - On AWS and GKE there are built in load balancers that comes with the cloud. ELB makes it easier to have 3 head nodes and avoid any possibility of failure.
Storage - Storage volumes are also a given and something that needs to be figured out for on-prem. Will you use NFS? CEPH?
Networking - Kubernetes doesn't come with a network overlay out of the box, but one is required to run it. There are many options to chose from and making sure it works in your private environment is another option that needs to be figured out.
You should note, however there are some products that you can buy that can solve these problems. RedHat has the OpenShift product that is based on Kubernetes. There is also Rancher and Tectonic.
And while these products are great, what we really wanted was to use as close as vanilla upstream Kubernetes and make a simple way for it to be running on UCS.
What we've come up with now is a set of semi-automated steps to bring up your Kubernetes cluster on UCS. We call it KUBaM! We've tested it with rack mounts and blades. Essentially there are four parts:
Prepare the Build Environment
Deploy Service Profiles and Operating Systems automagically
Run Kubernetes deploy scripts
Configure Day 2 operations: Remote access, Monitoring, etc.
While we are still early in the process and will look to combine steps (like 2 & 3) we are very optimistic in the modules that we are using to solve the problem and give UCS users a way to make the most out of their infrastructure. You'll also notice that there are things that we can do with this automated process on UCS that we can't do on other types of servers.
Rather than write a lengthy CVD we have decided to write this as an iterative live document on Github complete with code. To make it possible we use the Python UCSM SDK as well as Ansible. Our approach creates a stack that can be torn down and built back up from scratch in a matter of minutes (Minutes because, yes, you still have to wait for the blades or servers to reboot after installation). We also base the first 2 steps on my previous article on PXEless automated installs. After that we've created Ansible playbooks that bring the whole thing up using kubeadm and contiv.