cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1760
Views
10
Helpful
2
Comments
Shaun Roberts
Cisco Employee
Cisco Employee

logo_V2.png

 

Howdy out there in automation land! And welcome to the first blog of 2020... I know its been too long without one, but we have had so many things going on and lots I want to write about. And of course, make videos about them... so I just had to pick one thing and off we go. Today we are going to show a great example of collaboration across our Customer Experience (CX) organization! But first... a movie poster to go with this? Well this is about teamwork and one of my favorite movies and favorite teams was...

 

bb_movie_poster.jpg

 

Love the Blues Brothers! And Blue Brothers 2000 is a highly underrated movie... not for the acting... but the music! Anyways, back on topic. So what we want to talk about today is collaborating on a problem, designing a solution, and deploying it using Action Orchestrator.

 

The Problem...

 

So this week we ran into an issue where our production CCS/AO system went fully down... not because of anything we did, but because of something in Kubernetes that we should have been monitoring for. The certificates for the client side API calls expired. Thusly making our micro-services unable to talk to each other or the Kubernetes cluster itself... well that was not good! And yes, we did open a bug, CSCvt21378, on this so you guys can track it once it becomes customer facing! Anyways, so the whole cluster goes down... wow! That is not good.

 

So I started to dig through tech articles, Kubernetes settings, and working with other Cisco folks to start designing a "fix" in the knowledge that other customers with private cloud based CloudCenter Suite(CCS) systems would see the same issue. It was time to go into proactive mode to start looking for how we could detect this and get ahead of it! A little bit of downtime to implement a workaround/fix was OK... but a whole day or more is just unacceptable!

 

The Solution

 

After a solid day of working on it I designed a workaround/solution to fix the above issue. It will be coming out public as a knowledge article in short time or you can get the information through TAC... however that was not *GOOD* enough. We in CX can do better! We can innovate... we can AUTOMATE! In talking with a fellow CX Engineer, Jason Davis, he had an idea to throw together a quick bash script to check on the certificate expiration dates for each cert. What a novel and great idea!!! Here is a quick plug for his twitter account - https://twitter.com/snmpguy , he's a great follow! Anyways he designed this script:

for crt in /etc/kubernetes/pki/*.crt; do
     printf '%s: %s\n' \
     "$(date --date="$(openssl x509 -enddate -noout -in "$crt"|cut -d= -f 2)" --iso-8601)" \
     "$crt"
done | sort

Nothing overly fancy, but highly useful! So he had something in place to start looking at certificate dates... but we needed someone to make this reusable and deploy-able to our customer base....

 

Solving it with Action Orchestrator

 

Knowing that Jason is a huge AO person as well... we decided to collaborate and team up on this issue. How can we take a small script, package it for usage, and pass it out to the masses? Easy... we can write a workflow to handle it all!! And then publish it... so how do we start? We start with determining what the date was and how far away that was from the expiration date... we run the script to determine the dates of the certificates as well. We can use the Unix/Linux target to connect into one of our masters and run this bash script... like this:

 

teamworkpic1.JPG

 

We then needed to use regular expressions to filter out the data we needed and make it usable. In the video I will reference a great testing site that you can use, it is here: https://regex101.com/. Make sure when you do your regular expressions or look at the code that you use GoLang! (As that is what AO is using)

Then we have to look at the dates and do some similar logic to determine whether we are in a "ok" time period, we need to "warn" the administrators, or the certificate is flat out expired! (OH NO!) We can use a standard condition branch in this case and compare the dates in them to determine the outcome. To be fancy... we are going to create an HTML based email to send to the administrator and run this workflow daily to let them know what is going on...

 

teamworkpic2.JPG

 

 

And lastly we are going to send that nice HTML email to the administrator(like we said above)... it would look something like this:

teamworkpic3.JPG

Pretty slick huh? The best part of all this... the work that Jason and I put in, is sharable with you! So if you have CCS/AO, which you probably do if you are interested in my blog, you can go and import the workflow and script we created and use it in your environment to monitor your Kubernetes certificates. You can find it here: https://github.com/cisco-cx-workflows/cx-ao-shared-workflows/tree/master/CCSCheckKubernetesExpiration__definition_workflow_01E01VIRWZDE24mWlsHrqCGB9xUix0f9ZxG

 

 

So awesome... now you have a solution that went from a major problem to a simple script to find the certificate expiration dates to a sweet team effort to create a proactive monitoring workflow to keep your CCS system up and running! How neat is that?? You know what is just as neat? The video where I give you all the insight and talk you through the workflow... so now... ONTO THE VIDEO!

 

Play recording

Recording password: WmkexiV2

 

Just a quick ~17 minute video explanation for you. Reminder that the content above in github is opensource in nature and is *NOT* supported by TAC. It was written by some great folks in CX but is open for you to use, take, manipulate, etc. So please... enjoy it!

 

Again, special thanks to Jason Davis for the collaboration here and a wonderful CX Solution!

 

Standard End-O-Blog Disclaimer:

 

Thanks as always to all my wonderful readers and those who continue to stick with and use CPO and AO! I have always wanted to find good questions, scenarios, stories, etc... if you have a question, please ask, if you want to see more, please ask... if you have topic ideas that you want me to blog on, Please ask! I am happy to cater to the readers and make this the best blog you will find :)

 

AUTOMATION BLOG DISCLAIMER: As always, this is a blog and my (Shaun Roberts) thoughts on CPO, AO, CCS, orchestration, development, devops, and automation, my thoughts on best practices, and my experiences with the products and customers. The above views are in no way representative of Cisco or any of it's partners, etc. None of these views, etc are supported and this is not a place to find standard product support. If you need standard product support please do so via the current call in numbers on Cisco.com or email tac@cisco.com

 

Thanks and Happy Automating!!!

 

--Shaun Roberts

shaurobe@cisco.com

 

2 Comments
Martin L
VIP
VIP

 

Interesting, Thanks for sharing. you mean they re-do the movie? 

Shaun Roberts
Cisco Employee
Cisco Employee

No. Blues Brothers 2000 was a continuation of the story after they got out of jail. And Elwood had to fine "new" brothers.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: