How to implement a step by step rollout process?

erichgaede · ‎11-01-2018

I'm a NSO rookie, thus excuse, if this a silly question.

For demonstration purposes I try to implement a VPN rollout that executes in 3 steps

Configuration to make the VPN ready for testing.
Test VPN
Switch VPN from test mode to productive mode, if test was successful.
In fact this is a re-deploy with slightly modified service parameters.

Obviously steps 2 and 3 must not be executed if step 1 was done as dry-run and step 3 must not be executed if step 2, the test, failed.

I tried to implement this with data-kickers and 'reactive-re-deploy' as action. In doing so I encountered 2 problems.

P1: It seems that some external action is needed to make the data-kickers fire. But in the use case described above there are no external players that could do this. I had to change the attribute monitored by data-kicker manually to get the sequence running.
Question: is there any method a service can trigger itself? (Each step triggers the next one)
Or am I completely off the track using data-kickers and 'reactive-re-deploy'?
If yes, what would be the right way to implement this use case?
P2: How to figure out if step one was executed as dry-run.

In the examples that came along with NSO I didn't find any matching the use-case above. At first glance "getting-started \ developing-with-ncs \ 25-service-progress-monitoring" seemed to be the solution, but it uses a manual action as external trigger.

Both problems could be easily solved using a kind of workflow mechanism on top of NSO. Should this be the solution?
I hope not!

All help very much appreciated. Thanks in advance.

vleijon · ‎11-01-2018

Hi.

So, this is a fairly complicated question. What I am saying below should be seen as sort of a potential outline of an answer. I want to warn you that reactive fastmap is tricky, and that you might want to have an experience service developer help you with the design the first few times.

Lets assume you are working on a service named vpn, and an instance A so that we are dealing with /vpn{A}

What you generally want to do in a situation like this is to use reactive fastmap in a pattern that looks something like this:

- Create an action on the service, that tests the service and writes the result to operational data. Say that you write /vpn-test{A}/status on each test.
- In your service code, divide it into two pieces, 1 that makes it ready for testing, 2 that switches it to productive mode.
- Put an if statement saying that unless /vpn-test{A}/status == true you don't want to do part two.

In pseudo-code it looks something like this:
Do step 1
if (!/vpn-status{A}/status)
return
Do step 2

You should use two kickers. One is a data-kicker that kicks the action, this kicker should trigger on something that changes when you deploy your service, you could have it kick on the modification of the service instance itself for instance. The other is a kicker that kicks reactive-re-deploy on the service when /vpn-test{A}/status changes.

erichgaede · ‎11-06-2018

Hi,

Thanks a lot. Your hints brought me a big step forward.

But there is still a problem, I could not yet solve:

The whole sequence (step1, test-action, step2) works fine, if I first create the 2 kickers (one triggering test-action, and one triggering step2) manually and then create an instance of my VPN service.

If the kickers are created along with step1 in the cb_create callback of the Python service logic, they are created but do not immediately trigger an action.

Should I try to switch from service instance specific kickers to more general ones which could be created in advance?

Or is it possible to create the kickers in a kind of sub-transaction which is committed before step1. Commit of step1 would then match an existing kicker which will (hopefully) trigger the test-action.

Thanks in advance and best regards

Erich

vleijon · ‎11-06-2018

Hi Erich.

If you can create generic kickers in advance that is the best solution, it probably isn’t even hard in this particular case since the structure will always be the same. If this is difficult, I would be happy to help.

There are other solutions for cases where this is not possible - creating a separate transaction that writes the kicker might be the easiest ones. (You can even create a separate transaction from inside the create code if you want).

--Viktor

erichgaede · ‎11-06-2018

Hi Victor,

I tried a generic kicker monitoring the whole service list

kickers data-kicker DBS_VPN-kick1 \
monitor /DBS_VPN:DBS_VPN \
kick-node . \
action-name ping-test

Seems to work fine, as the ping-test action is only called for list elements that changed. But now ping-test has to figure out, if it has been called within a service creation process. Only in this case it should do the tests. If an existing service instance is modified or deleted the ping-test action is also called by this kicker but then there is in fact nothing to do.

One minor problem in this context:

If a service instance is deleted, this leads to a "KeyError: '{…} not in /DBS_VPN:DBS_VPN'" when the system tries to call the action. One can see this error in python log only – so its not disturbing normal users. But is there any possibility to avoid this? In my opinion this is a kind of error in NSO - accessing a certain element in a list one should always be aware that this element might not be present.

The second kicker triggering step2 is still created along with step1. It monitors an instance specific test_successful flag going from false to true. This flag is set by the test-action – thus after the creation of the kicker has been committed. So this section also works fine.

kickers data-kicker DBS_VPN-kick2-xxx \
monitor /DBS_VPN:DBS_VPN_provState[execution_name='xxx'] \
trigger-expr test_successful='true' \
trigger-type enter \
kick-node /DBS_VPN[execution_name='xxx'] \
action-name reactive-re-deploy

Some performance considerations:

When working with service instance specific kickers the number of kickers in the system may be high or huge. Are there any concerns from the system point of view? Will this degrade overall performance, for example? Is there a limit for the number of kickers?

On the other hand, when using generic triggers like the DBS_VPN-kick1 above, the action is often called when there is in fact nothing to do. This may also have performance impacts.

Thanks

-- Erich

vleijon · ‎11-06-2018

If you don’t want to rerun the test on modification, and want to avoid running it on deletion you should do two things.
1. Add a trigger-expression such as ./name , requiring the name to exist
2. Set trigger-type to “enter” so it is only triggered when the condition becomes true.

I have limited insight into the kickers and how they affect performance - but I have not heard any reports of big performance impacts. Generally, what many people do is that they delete the kicker once they have reached the final state. If you can use generic kickers and modify the trigger expression appropriately that should be a good solution.