Question about conflict detection

peter-stamp · ‎05-01-2025

Hi,

I am hoping to get some guidance on conflict detection. We are running NSO 6.3.1.

I have created one complex service that does builds the interface for customer services. It does a lot of work, it has a lengthy xml template and implements logic in its Python create function which then stores a summary of its work in the service as operational data (config=false and tailf:persistent=true).

Then we have multiple customer services (eg Internet) and each has a leafref to an instance of the complex service and they all rely on it to implement all the logic and they simply use the summary operational data that it leaves behind for them. The aim is to prevent these services from replicating the logic. These are dependent services, on the output of the complex service.

However, I was naïve and didn’t notice my problem when I created my first dependent service as NSO happens to always execute the depended-on service before the dependent one. The second one happens to execute in the other order and so exposed the problem, which is that the dependent service looks for the operational data which hasn’t been written yet, so doesn’t work.

I have considered a few solutions that remove the dependency, but for various reasons they aren’t attractive in our solution. They include - making the complex service a stack child of all the others as parents; committing the services serially in separate commits; or worst of all duplicating the complexity into multiple customer service types. I’m aware that dependent services are not recommended, but I would like to find out if there is a way to work around the issue without abandoning our design.

My questions are:

Should conflict detection detect the write after a read, and retry the reader (dependent) service? We have a very low transaction rate and plenty of CPU’s so a retry is no big deal.

Is operational data monitored as part of the read-set/write-set conflict detection feature? Or does it only detect conflicts in device config for example?
Since both services are executing in one transaction (one commit or Yang Patch), is conflict detection still relevant? Or does it only detect conflicts between different transactions, not between services within a single transaction?

Is there any way that I can influence the order that the services are executed in the transaction? Ie can tell NSO that the depended-on service must always be created before all the other services that depend on it?

Can “conflicts-with” help here? It doesn’t seem to. It seems to stop two services that are known to conflict from running at the same time, but within one transaction I always see one service be created before the next starts (eg adding a long sleep). Should it influence the order?

Thanks,

Peter.

huayyang · ‎05-05-2025

as documented in

https://cisco-tailf.gitbook.io/nso-docs/guides/development/advanced-development/developing-services/services-deep-dive#ch_svcref.caveats, nso doesn't guarantee in what order services are invoked in a transaction.

and given that essentially you need to have the "interface" service run before the "internet" service, this is not, imo, a problem conflict detection should solve, nor can conflict detection solve as it checks conflicts among different transactions on config data.

I think the best solution for you is stacked service. can you explain why it's not attractive in your solution?

peter-stamp · ‎05-05-2025

Thanks huayyang for the info. It is now very clear that there isn’t any mechanism to influence the order of execution within a single transaction. Likewise my train of thought on conflict detection wasn’t so much that it should work, but rather that if it happened to detect the failed dependency and re-run the dependent service it would have been quite convenient. But good to rule them out, thanks for getting me there.

Regarding my hesitations about stacked services, they are not NSO limitations, but rather to do with our service modelling where the relationship between parent and child may be many to one in some cases and to avoid duplicating Yang model contents between models. Nothing unsolvable.

With all this in mind, if I try to isolate the root cause of our problem so that we can identify the fix, it appears that it all starts with our attempt to use operational data built by the depended-on service, in the dependent service? Ie specifically operational data. Is that correct?

To expand on that, I created 2 simple test services, a parent and a child, and to reproduce the dependency I added a line to the child’s create function to store a value in its operational data that is read by the parent template, a mini recreation of our actual dependency problem. Using “commit | debug template | details” I can see that the create function for the parent is invoked and its template applied, and only then does the create function and template of the child run. So, using stacked services I have controlled the order, but still the operational data from the child does not exist at the time that the parent template does a deref to reach into the child. To confirm that operational data is the problem, if the deref accesses regular service data (not config false) in the child the reference works fine (it doesn’t make sense to do this, but shows that only operational data has the ordering dependency).

So, it seems that the fix that we choose must do away with using operational data as an inter-service communication method to pass a message from the complex interface service back to the simpler high-level services.

Two possible solutions jump to mind. I think clearly the second one looks the most attractive, but I’d really appreciate if you could offer any thoughts.

1. Duplicate python logic from the low-level service into all high-level services so they can calculate the data they need, avoiding the need to pass it back through operational data. If we also duplicate the low-level services template into each high-level service we could just abandon the low-level service.

2. Use stacked services and turn the low-level service into a child “super service”, exposing the child’s Yang model in each parent’s Yang model, and move the template contents from all high-level services that previously accessed operational data in the low-level service into the child service’s template. The parents are now shell services and we are asking the child service to do the device config generation on behalf of all parents. Our OSS front-end would only make RESTCONF calls to the parent services.

Thanks,

Peter.

snovello · ‎05-06-2025

Hello,

Yes I agree, the operational data within a service can really only be used inside that service, and not to communicate across services.

For option 1 you do not have to duplicate the logic. You could expose the computation function via one or more actions or by calling python code in the interface service from the other services directly (I think actions might be easier to test/maintain but both possible). That option stays closer to your current design I think.

I suspect option 2 might not work but I'm not sure I understand it 100%. Surely if the logic for the parent service is now in the child service, it will only execute if you change any of the config parameters on the child. Also you mention the child might bethe child of several prents, would it now need to keeep track of all the parents and re-execute their logic whenever it's parameters are changed?

Also a general note, the choice of whether to use stacked services or dependedancy with a reference should be driven by the required lifetimes of the two services. If the child is only created when the parent is created, and should be removed when the parent is removed (or the last parent when multiple parents exist), then a stacked service design is preferable. If the child could exist independently of any parent, then it cannot be a stacked service. It is quite common to have a stacked service pattern with many parents, for example you have a service representing a common resource e.g an access list, and different parents are adding their own prefixes to enable

peter-stamp · ‎05-06-2025

Thanks snovello for your answer.

I like your suggestions for my option 1 - actions; and calling python code in the interface service from the other services. As you point out this will allow us to keep our current design, which sounds great. However I haven't been able to work out how to implement them or find examples of similar code.

I have read all I can find on actions, but aren't sure how they could be used in this case. I assume that my interface service would register an action which would invoke a python function that returns the computed data that I have previously been storing in operational data (ie per service). But I can't figure out how the action could be invoked by the other service. The examples (mostly describing a "double" function to return input*2) are all invoke the action in the NSO CLI, whereas I would need to call it from the Python create function of another service during the transaction where both services are being created. Is this possible? Noting that the service whose action should be called hasn't been committed yet.

Likewise I haven't been able to find examples of a way to call a function in the interface service from the create function of the other service. If I place a function in the interface service python directory and then include it in the other service, I am unclear on how I would access the instance of the interface service which is still yet to be committed in the transaction creating both (either as a stack child or standalone).

Any further help appreciated.

Thanks,

Peter.

peter-stamp · ‎05-07-2025

Thanks @snovello. I have taken your suggestion of exposing the computed data via an action, and I believe I have it working. The code I have is made up of bits and pieces I've found in half a dozen other forum posts, so as a last check it would be appreciated if anyone can comment on whether the code is valid.

Back to my original description, I have a complicated service that builds interfaces to be used by multiple higher level services like Internet, L2VPN, etc. The high-level services were dependent on operational data in the interface service, but this failed when both were created in one transaction as the operational data didn't exist if the create function of the depended-on service was executed second.

So now I have added an action to the interface service, using the "Double" action in ncs.examples and various posts. The computed data is now exposed via action output and the action works fine when invoked via CLI.

But I want to invoke the action function from inside the create function of the high-level service, not CLI, and in the context of the interface service that is being created in the transaction.

So in the high-level service create function I have the following to call the action of the interface service.

A question about the code below:

To reach the action, I am going back to the root and going into the interface service using its key which is known to the high-level service. Is there a better way?

def cb_create(self, tctx, root, service, proplist): 
    <snip>
    # The action is called "summary" in container "action" in the interface service. Call it. It takes no input.
    summary_action_function = root.services.interface[service["interface-svc-id"]].action.summary
    action_input = summary_action_function.get_input()
    action_output = summary_action_function.request(action_input)
    # Store the returned data in this services operational data, so we can use it in the template.
    service["important-stuff"] = action_output["important-stuff"]
    <snip>

And in the low-level interface service, the action is implemented as follows. I have put the computation code in my_function() which is called by both the action when asked to by another service, and by its own create function so that it can use the computed data itself (eg in its own template or Python code).

Some questions about the code below:

Is it valid? Or am I way off track?
Is calling a common function from the action and create functions OK?
What is the best way to access the service data to be used for computing the return data. Below I have two alternatives, one that access the service using the key name of the service that contains the action (I would pass it via input, just static for testing). The other uses the kp argument but has to go up one level with .. to get out of the "action" container. Both seem to work, but is either better?

def my_function(service):
    result = {}
    result["important-stuff"] = "insert complicated logic here"
    return result

class SummaryAction(Action):
    @Action.action
    def cb_action(self, uinfo, name, kp, input, output, trans):
        root = ncs.maagic.get_root(trans)
        #service = root.services["interface"]["interface-1234"]
        service = ncs.maagic.cd(root, str(kp)+"/..")
        result = my_function(service)
        output["important-stuff"] = result["important-stuff"]

class ServiceCallbacks(Service):
    @Service.create
    def cb_create(self, tctx, root, service, proplist):
        <snip>
        result = my_function(service)
        self.log.info('result:', result["important-stuff"])
        <snip>

Thanks,

Peter.