Unlocking Performance in your NSO System, Part IV

KJ Rossavik · ‎09-09-2020

Previous instalments of this blog post series have covered requirements (parts I and II) and baselining, profiling/optimising and monitoring (part III). In this final part we will discuss some architectures you may want to consider as your system grows in scale and performance. We would recommend a simplicity-first approach, and not introduce further complexity before it is needed.

Transactions in NSO

The majority of NSO deployments are running in transactional mode. This is the least complex way of running NSO, and this is the way we would recommend all users to go first. Run with transactions as long as you can. Run on the biggest, most powerful single server you can afford. That way you don’t need to move to more complex solutions such as commit queues and clustering until later, or perhaps never.

When you run a transaction in NSO then the system will spend some time preparing, then some time doing the actual transaction, and then some time cleaning up. The important point is that the CDB database and the overall system is locked during the middle section, and hence the system will process these middle sections in series, one by one. Therefore, it is critical to keep processing in this section to an absolute minimum. This means that there is a lot of scope for optimisation in most service applications. So, when you start running out of processing power, the first thing you should consider is to optimise the solution. E.g. does the system make calls to other systems and spend time waiting for them to return? Try to move anything that takes time out of the critical section of the transaction. Unfortunately, what is going to take most time is probably interacting with the devices, and even though device interactions take place in parallel, the duration of a transaction cannot be faster than the slowest device interaction.

See also:

The Anatomy of an NSO Transaction, Devnet NSO Best Practices, https://developer.cisco.com/docs/nso/#!performance-scaling/anatomy-of-an-nso-transaction

Reactive Fastmap – Nano Services

If the system does have to make calls to other systems, other than the devices to be configured, then perhaps Reactive Fastmap is a design pattern you want to consider. Or perhaps there are external system operations that take a long time, where you don’t want to wait in a transaction lock for the operation to conclude, e.g. starting up a VNF. This design pattern breaks the transaction processing into a sequence of transactions which can be run incrementally until the service intent has been reached. This allows other transactions to be processed while you are waiting to be able to run the next incremental step. Hence, this allows a greater degree of parallelism in the system.

Reactive Fastmap has been available in NSO for many years, and over the years several additional features have been added in order to make it easier to use, reduce code volume, and increase maintainability. These include Kickers, Plans, and most recently Nano Services.

See also:

NSO Developer Days 2019: Nano Services – Another approach for Reactive Fastmap (RFM) services, Try Ryeng, https://www.youtube.com/watch?v=NJhOBf8J-J8
NSO Developer Days 2020: Building a Service - from Template to Reactive Fast Map to Nano services, Simon Unge, https://www.youtube.com/watch?v=OIzBhzdAC9M&list=PLhTPrPcGzO7GZyx5DGpP6sCEiBYPYK24d&index=26&t=0s

Commit Queues

Commit Queues is another way to achieve more parallelism. Here you break down the transaction so that instead of processing service transactions in series, you create a queue per device. This enables you to process multiple transactions in parallel. This is particularly the case if the transactions don’t touch overlapping devices. This comes at the cost of higher complexity, but there is an increasing set of tools available in the NSO platform to manage this complexity.

See also:

NSO Developer Days 2020: Demystifying NSO Commit Queues https://www.youtube.com/watch?v=9pEdvZS5yE0

Layered Services Architecture

Parallelism and concurrency are complex topics. Universities teach entire courses about these. It is important to note that clustering architectures such as NSO LSA are used for other reasons than scale and performance. It is a very elegant solution to cross-domain orchestration, where you want each domain to be owned by a separate organisation, with a level of autonomy of operation.

When considering LSA for scale and performance, one obvious conclusion is that a cluster processing one transaction at the time is no faster than a single server doing it - in fact it is probably slower. Therefore, you want to deploy Nanoservices and/or Commit Queues before moving to LSA. You also need to be very clear about how you believe your service application can be parallelised in order to take advantage of multiple servers. We suggest you work together with the NSO Solution Architects on such architectures.

See also:

NSO Dev Days: LSA, Scaling & Performance https://www.youtube.com/watch?v=IfSnurZZwd0

Summary

In this four part blog post we have discussed an approach where you define the requirements, select an appropriate server size, test that the solution can support the required scale and performance, profile and optimise as needed, and monitor the deployed solution so that you know whether it is delivering the required scale and performance.

We also discussed a simplicity-first approach where you start with a single server deployment in transactional mode, and then gradually introduce more complex architectures when the solution is approaching running out of steam. These architectures include Nano Services, Commit Queues and Layered Services Architecture.

Ruben Cocheno · ‎09-09-2020

@KJ Rossavik

this is great stuff, thanks for sharing.

Unlocking Performance in your NSO System, Part IV

Get NSO for Free!

Collective Intelligence resolved my SSH problems with NSO on the latest macOS Mojave!

Secrets of the NSO Transaction Phases