Deploying NSO in Brownfield Networks

KJ Rossavik · ‎08-03-2020

It is very rare to be building a brand-new network from scratch. A much more common scenario is to introduce automation into a pre-existing network, which is already delivering services to customers. With NSO having been deployed in more than 200 customer networks, this scenario has been encountered numerous times, and in this document, we will discuss the challenges this imposes, and some approaches that have been used.

Managing new service instances - “Ships that pass in the night”

The simplest approach is to use NSO only to deploy and manage new service instances. Other pre-existing service instances are managed through other means, either using CLI towards the devices or using another management system. This could be the end state, or it could be a steppingstone towards more complete NSO management of the network and services.

When using NSO to partially manage networks and devices, the two main issues you will encounter are

Synchronization of NSO CDB with the network device configurations, and
Management and allocation of resources.

Configuration Synchronization

When you connect NSO to a network device for the first time, you need to perform a sync-from operation. This will cause NSO to retrieve the running configuration from the device, use the NED in order to parse the configuration into the device YANG model, and store the data in the NSO CDB. NSO will then assume that the configuration in CDB is in sync with the device configuration. The configuration held in CDB is used in order to calculate the configuration changes required for any given operation such as service provisioning or modification/cease.

Whenever NSO initiates a transaction towards a device, the first step in the default transaction process is to check that the CDB representation is in sync with the device configuration. It is the NED’s responsibility to implement the check-sync operation in a way that is suitable for the managed device type, e.g. transaction-id, timestamp, etc. If NSO detects that the device configuration is out of sync with the CDB representation, then the transaction will fail and roll back any changes made so far to other devices included in the same transaction.

In this brownfield scenario, where NSO is only managing new service instances, the assumption is that other service instances are managed through other means, either using CLI towards the devices or using another management system. This means that the CDB configuration is likely to be out of sync with the device configuration at any point in time.

There are a few different cases

The configuration made outside of NSO does not overlap with any configuration that NSO is making, and hence they will not conflict or overwrite each other
The configurations are overlapping

In the first case, since the configurations do not overlap, then you can execute the device transactions in such a way that the check-sync operation is not done, assuming that the system will always be out of sync. Additionally, the “no-overwrite” option can be used in order to detect any out-of-band changes to any of the configuration that is going to be changed in the transaction. Any such conflicts can then be dealt with in a fallout process, where someone can figure out which configuration is the correct one, and accordingly either update the service instance or redeploy the service in order to re-establish the service intent from NSO.

The second case can be dealt with in similar manner, except here everything will be fallout, so this is a much more expensive scenario operationally.

The NSO Developer Days video below explains this in more detail.

See also:

Synchronizing Devices, NSO User Guide, Chapter 3, pp45 in v5.2.2
NSO Developer Days 2018: NSO in brownfield deployments, Tomas Mellgren, https://www.youtube.com/watch?v=ExdryBzvRRo

Allocation and Management of Resources

Most NSO service applications/function packs automatically allocate some types of resources. Resources have different scope – some need to be unique per port, per card, per device or (sub-)network wide. And IP addresses of course generally need to be unique globally.

Allocation of resource in a brownfield network first needs to ensure that it has excluded any resources that are already in use. If the service provider uses an external allocation system, then you need to integrate the NSO service application with this system. The NSO Resource Manager (RM) CFP is designed to be able to integrate with external systems, but it can also be used standalone. If NSO RM CFP allocates resources internally, then you need to mark such resources as used. It now depends on where the service provider maintains such resources, whether they are kept in a manual system such as a spreadsheet. The last resort is to search CDB and use that data to populate the RM CFP with used resources.

Additionally, if the SP is going to create some services with NSO and possibly continue creating other services through other means (e.g. via CLI or other management system), then they need to ensure that they do not allocate overlapping resources. This means that they either need to use the same external allocation system, and/or define non-overlapping resource pools.

See also:

NSO Resource Manager CFP User Guide

Managing Pre-Existing Service Instances

You may also want to manage your pre-existing service instances with NSO. This could involve

For the function pack/service application that you have introduced in “ships that pass in the night” mode, you may want to take over management of pre-existing service instances of the same service type
Managing instances of other service types

How uniform are the pre-existing services?

Imagine that you have deployed a function pack for service type A (e.g. L3VPN), and you have started creating instances of this service type. There are previous instances of service type A on the network, which have been either created manually, or using some other provisioning tool. How uniform are these?

Would it have been possible to create those instances with your function pack?
If not, how many variations are there? In the worst case, every service instance has been custom built for the individual end customer

Most likely there will be service instances that could not have been created with the existing function pack. Options include:

Modify your function pack to cater for the additional options. If the function pack has been created by the service provider, or for the SP by Cisco CX or other SI, then that can be possible. If this is a productised Core Function Pack, then of course it cannot be modified by (or for) the customer.
Re-provision the service instances with the existing function pack, and hence cleaning up legacy to streamline the service portfolio. This may have contractual implications and it may require re-negotiation with the end customers.
Continue to manage certain service instances manually. Even if they could be automated, it is not given that they should. It may not be cost-effective to automate legacy one-off service instances – particularly if there is no or little need to change them going forward. Consider the cost of implementing automation versus the cost savings the automation is going to bring.

What are the pre-existing service instances?

To identify the pre-existing service instances, you need first to know how they were created in the first place. The service provider will probably have record of the service instances

In the provisioning/inventory system that was used to create them originally
In some other service inventory
In a spreadsheet/document
Spread across multiple of these

There is also the chance that the records are inaccurate relative to the configurations that exist in the network devices.

You can also try to discover the services from the network devices, but to state the obvious: You can only discover from the network devices data that exists in the device configuration. Data such as service name and ID and customer name etc. may be stored in description fields, or may be implicitly encoded in VRF name, VLAN ID, etc., or may not be stored in the devices at all.

The network will probably also have orphaned service configurations, or parts thereof. These are service instances that should no longer be in use (but may or may not be), or remnants of ceased service instances.

Migrating pre-existing service instances under NSO management

Hence, migrating pre-existing services into NSO automation is a forensic exercise, where you need to find sufficient data to create each instance, and then provision them e.g. using dry-run in order to see if the configuration generated matches what is on the network.

If the service application/function pack automatically allocates resources, then you need either to

Accept that it will probably allocate different resources from those in the pre-existing service instance, or
Create your function pack in such a way that such resources can be given as explicit attributes for the service creation

For NSO to take ownership of the configuration associated with a pre-existing service instance, it needs to be tricked into believing that it created the configuration. All configuration that is owned by an NSO service instance is tagged with a reference count which reflects how many NSO services depends on this configuration. This ensures e.g. that NSO does not delete the configuration before it is certain that the configuration is no longer in use.

If you don’t update the reference counts then NSO will not own the pre-existing configuration, and if you delete the service in NSO then NSO will put the service back to the state it was before NSO modified it, I.e. back to the pre-existing service.

The NSO Development Guide reference below explains how to deal with the reference counters and has a wider discussion about service discovery.

See also:

Service Discovery, NSO Development Guide, Chapter 10, pp258 in v5.2.2

On-going Discovery and Reconciliation

So far, the assumption has been that you discover any service only once. This is typically done by CX during the deployment of NSO into the service provider network. Once a service instance is under management by NSO then all subsequent changes are done there. From then on, any further service instances of that type are provisioned using NSO. This is the recommended operating mode, and the cheapest.

However, some service providers have organisational and operational constraints that mean that this is not always possible. They necessitate out-of-band changes to service instances under NSO management and even new service instances to be created outside of NSO. This may require the process described above to be applied repeatedly, in order to discover new service instances.

If changes have been made to service instances under NSO management, e.g. changes to an attribute value, then these can easily be identified by a check-sync operation, and then the NSO service instance can be modified accordingly (or otherwise the NSO service intent can be redeployed, overwriting the network configuration)

See also:

Service Impacting out-of-band changes, NSO User Guide, Chapter 5, pp113 in v5.2.2

Reactive Fastmap, Stacked Services and Layered Services Architecture (LSA)

Architectures such as those based on Reactive Fastmap (RFM) or Nano-services add additional complexity. Now the normal dry-run does not work out of the box, since it will show only the configuration after the first iteration of RFM. There are ways of getting around this, however.

It is now common in NSO function packs/service applications to have multiple models on top of each other. While we previously talked about a service models mapping to device models, there are now layers of Stacked Services, possibly in a Layered Services Architecture. Since the purpose of such model stacking is to abstract and hide details towards the bottom of the stack, this makes it harder to provide explicit resource allocation, as these parameters need to be propagated up the model stack to the top model.

See also:

Reactive Fastmap, NSO Development Guide, Chapter 10, pp205 in v5.2.2
Nano services, NSO Development Guide, Chapter 18, pp335 in v5.2.2
Stacked Services and Shared Structures, NSO Development Guide, Chapter 10, pp256 in v5.2.2
NSO Layered Services Architecture User Guide

Automated Service Discovery and Reconciliation

As we have discussed so far, there are a lot of considerations to be made regarding placing pre-existing services under NSO management. The question now is whether it can be automated, but also whether it should be?

As usual, the answer is “it depends”. If you have 10,000 instances of service A, 1,000 of service B, 100 of service C and 10 of service D, then it’s probably cost-effective to automate discovery of service A, but probably not for service D. This is, of course, if service type A is sufficiently uniformly applied, and that you are not dealing with a hundred variations of service A...

Do consider, however, that service discovery is usually a once-only operation, and it could be easier to outsource to an organisation that does this repeatedly, e.g. Cisco CX.

A framework such as the one described by Dan Sullivan in the video below can greatly reduce the cost of automated service reconciliation, but depending on your services, this can still be costly.

See also:

NSO Developer Days 2018: Reconciling out-of-band changes, Dan Sullivan, https://youtu.be/yYzk8aXMCbY
NSO Developer Days 2019: Brownfield Service Reconcile Operations, Dan Sullivan, https://www.youtube.com/watch?v=KudcsCAE-Sw
NSO Developer Days 2020: Learning to Live with Imperfection, a Service Reconciliation Journey https://www.youtube.com/watch?v=DNwYQGed9TE&list=PLhTPrPcGzO7GZyx5DGpP6sCEiBYPYK24

Deploying NSO in Brownfield Networks

Get NSO for Free!

Collective Intelligence resolved my SSH problems with NSO on the latest macOS Mojave!

Secrets of the NSO Transaction Phases