04-07-2021 09:19 AM
We're in the process of renaming a number of devices across our network. When renaming the devices in NSO this seems to present challenges for any services that are already deployed on them. I'm opening this thread to explore gather opinions and suggestions on how to undergo this transition while maintaining the state of deployed services.
Our service yang models use leaf-refs to restrict selections of certain fields to existing data. In almost all cases these fields are mandatory. There is also Python code which looks into the device configuration to make certain decisions about the configuration being deployed - for instance, using parts of the device hostname or the Loopback0 address.
Previous threads on this topic have suggested using a no-networking commit to perform the updates, but I don't think this works for us.
When the device is renamed (using the rename command), the service configurations are not updated automatically. The act of renaming the device also invalidates the selections due to the yang restrictions, so the configuration must be "restored" manually during the transaction. When committing this complicated transaction, the service code throws errors because it cannot access the device configuration. The configuration the service needs is in the same transaction, so it is invisible to the service deployment code.
We thought about creating a new device alongside the existing device and moving the services over to the new device. A sync-from would be performed on the new device initially to prevent the issues with the service code, but this also removes the possibility of a reverse-diff since the earliest version of the configuration on the new device already includes the service configuration deployed. Once the services are pointed to the new device, their get-modifications would be empty. We could re-deploy reconcile the services, but removals would not have a configuration to revert the device back to. Normally, an interface, for instance, would return to its "default" state which would have predated the service configuration.
We also considered modifying the service code to allow bogus selections temporarily while this work was undertaken. The yang model adjustments could probably be made relatively easily, but the problem comes with the code that reaches into the device configuration. The values normally retrieved from the CDB would need to be provided manually in some way. This is probably impractical to do for the several hundreds of deployed service instances.
Probably the worst option we came up with was to perform the updates on an overnight maintenance window, remove the service configuration entirely, rename the device, and deploy the service configuration from scratch. This would maintain the proper back-pointers, but would cause a service outage for our customers due to essentially a cosmetic update on our end, which we would want to avoid if at all possible.
At this point we're not sure what the best way to go about this will be.