Best Practices on Cisco NSO with Python

dmarasch · ‎03-19-2024

XPATH for large configuration lists

When dealing with big configuration lists (e.g. xconnect entries on XR devices, l2circuit entries on Junos devices, etc.) it is important to choose the right way to access an element.

If you don't know in advance which node entry to pick and you need to iterate over these lists, then the best approach is to execute an XPATH query.

Tools

NSO provides some tools that helps a developer when dealing with XPATH query, that is:

Log file, xpath-trace.log, can be enabled when testing (do not enable them in production since they will slow-down NSO). This trace logs all the xpath evaluations done by NSO and it is useful to evaluate the performance of a transaction that makes use of xpath query and to validate query results
Use the "xpath eval" command from the NSO cli. This command gives the possibility to experiment with xpath expressions. E.g.

XPATH

admin@nso1# devtools true

admin@nso1# config

admin@nso1(config)# xpath eval <xpath-expression>
NSO 6.x progress-trace makes use of span (unit of work) and will write how long it took to execute a single operation

Code Examples

These examples are meant to show how to approach the same problem using maagic api and xpath queries.
When developing, take always in mind code readability and performances. A complex xpath query is not optimal in every context and, sometimes, the api approach is preferable, especially if the read set is not large.

As a good practice advise, be sure to test xpath queries or your code on large datasets before deployment since larger datasets slow down performances.

Retrieve a single node

Find the PW-ID associated to an Interface

Regarding performances, be aware that maagic api cache every node (its schema information and sibling data) we read from the cdb. For this reason, reading nested or multi nested lists can lead to an increment in memory consumption

Loop

d = root.ncs__devices.device[device]
groups = d.config.cisco_ios_xr__l2vpn.xconnect.group
for g in groups:
    group = groups[g.name]
    p2ps = group.p2p
    for p in group.p2p:
        p2p = p2ps[p.name]
        neighbors = p2p.neighbor
        interfaces = p2p.interface
        for i in interfaces:
            if i.name == "GigabitEthernet0/0":
                for n in neighbors:
                    return str(n['pw-id'])

XPATH

device_path = "/devices/device[name='ta-netsim-xr-1']/config/cisco-ios-xr:l2vpn/xconnect/group/p2p/interface[name='TenGigE0/10/0/0.3653']"
    return transaction.xpath_eval_expr(device_path, trace=None, path="")

Retrieve a list of nodes

Retrieve a list of nodes from a list

Loop

intfs = []
d = root.ncs__devices.device[device]
for intf in d.config.junos__configuration.interfaces.interface:
    if intf.name.startswith("ge-"):
        intfs.append(intf)
return intfs

XPATH

device_path = "/devices/device[name='{}']/config/junos:configuration/interfaces/interface[starts-with(name,'ge-')]".format(device)
intfs = []
def set_interface(kp, value):
    intfs.append(ncs.maagic.get_node(trans, kp.dup()))
trans.xpath_eval(device_path, set_interface, trace=None, path='')
return intfs

XPATH queries can be more efficient, 20/30x faster than looping a list, and use way less memory to execute.
However, note that python for loop is not less performant than NSO internal code (that uses XPath language) but that, for each iteration in a python for loop, there are one or two interactions between the python VM and the ncs.smp process, whereas the evaluation of an XPath expression is done without such extra context switches. Hence, when using XPath, is still important to avoid unnecessary computation.
Taking the first example, if we know in advance that a given sub-interface can be found only under one xconnect, then we can rewrite the XPath expression using positions as follow: "/devices/device[name='ta-netsim-xr-1']/config/cisco-ios-xr:l2vpn/xconnect/group/p2p/interface[name='TenGigE0/10/0/0.3653'][1]". This is the same as having a return statement on the first hit of a for loop.

Monitoring

The examples above use simple filters, while more complex ones can lead to bad performances and must be chosen accurately.
For this reason, sometimes is worth measuring the performance of the function that we execute.

If, in some cases, you are not sure which method fits better, you can use this decorator to log how much RAM and time your python function takes (this will show python memory usage and not NSO). Results are logged in ncs-python-vm.log.
This decorator is meant to be used only for development and testing purposes, do not use it in production.

Loop

import tracemalloc
import time

def measure(fn):
    @functools.wraps(fn)
    def wrapper(*args, **kwargs):
        tracemalloc.start()
        start_time = time.time()
        fn_res = fn(*args, **kwargs)
        end_time = time.time()
        time_taken = end_time - start_time
        _ss = tracemalloc.take_snapshot()
        size = 0
        memory_blocks = 0
        for _i in _ss.statistics('lineno'):
            size += _i.size
            memory_blocks += _i.count
        profile_details = {
            "func-name": fn.__name__,
            "memory-blocks": memory_blocks,
            "memory-usage": f"{size / 10 ** 6:.2f} MB",
            "time-taken": f"{time_taken:.2f} seconds"
        }
        print(profile_details)
        tracemalloc.stop()
        return fn_res

    return wrapper

@measure
def my_func():
    pass

Finally, remember that a good usage of the log functionality can ease not only troubleshooting sessions but can give also very good insights on where we spend most of the time in our code. Balance the usage of trace level to not overflow logs file in production.

Bonus

If you want to use the xpath query to retrieve a large set of nodes, you might consider to use NSO query API (https://developer.cisco.com/docs/nso/api/#!_ncs-maapi).
As a plus, ciscoUtils package exposes a wrapper for this function.

Loop

with ncs.maapi.single_read_trans("admin", "test-context") as t:
    maapi = t.maapi
    path = "/devices/device[name='ta-netsim-xr-1']/config/cisco-ios-xr:vrf/vrf-list[name='LTE-NAM']"
    select = ["rd"]
    initial_offset = 1
    query = maapi.query_start(
        t.th, expr=path, context_node="/",
        chunk_size=1, initial_offset=initial_offset,
        result_as=ncs.QUERY_STRING, select=select, sort=[]
    )
    total_result = maapi.query_result_count(query)
    query_result = maapi.query_result(query)

In addition, another wrapper, more concise, for the Query api can be found under ncs.experimental.Query (https://developer.cisco.com/docs/nso/api/#!ncs-experimental).

ncs.experimental.Query

with ncs.maapi.single_read_trans("admin", "test-context") as t:
    path = "/devices/device[name='ta-netsim-xr-1']/config/cisco-ios-xr:vrf/vrf-list[name='LTE-NAM']"
    with Query(t, path, '/', ['rd'], result_as=ncs.QUERY_STRING) as q:
        for r in q:
            print(r)

Time consuming operations must be handled outside service logic

There might be some scenarios where is not possible to use an XPATH (e.g. when the query would be too complex) while using the MAAPI can be costly in terms of time (e.g. looping a big list). These scenarios can mainly happen in two places:

Inside an Action
Inside the service logic of a Pre/Create/Post callback

The first case is safe since the action won't take a lock until you commit (it is anyway recommended to perform only the necessary operation inside a transaction lifespan). The second case is different. Each of the three callbacks is executed while the transaction lock is grabbed. To avoid this behavior, you can move the portion of code that takes time to execute, from the service logic into a new action. Then, you deploy the service using the action once the data you were looking for are available.
From NSO 6.x, the three callbacks are called outside the lock. Hence, this is to be considered as a temporary option and it is not designed for the future. Moreover, from NSO 5.6 also the pre_lock_create callback has been deprecated. So if you are using an NSO version between 5.6 and 6.x not included, this is the only viable option.

Remember that Looping on large nodes, using MAAPI, or making complex XPATH query is costly. Sometimes a wider XPATH query that retrieves a larger collection of nodes to work on is a better option.

NSO 6.x

NSO 6.x introduces a new concurrency model that greatly increases the achievable throughput, and avoids the need for a global lock on user code. This means that the create() callbacks run in parallel and their execution is lock-free. So, a service create() that takes longer than normal to execute, would not affect NSO transaction throughput

Create() Execution Time

Here for service logic we mean the code that is executed inside the three callbacks: Pre-Modification, Create, and Post-Modification. For the sake of brevity, we will reference them as create().
The create() must be idempotent and must run in a few hundred milliseconds (at most a few seconds for very large services)

The execution time of those callbacks is logged inside the DEVEL log and can be also checked from the CLI using the command: commit | details very-verbose. If a create() writes configurations on other services (stacked service scenario), then you need to calculate the execution time of the other create().

Lastly, from the DEVEL log you can check the "run transforms and transaction hooks done" line that reports the time spent inside the create() and the time spent by NSO to compute the reverse diff-set of your commit.

Tracing Improvements

From NSO 6.x, progress-trace log makes use of spans (that represents a unit of work, altogether build the entire trace), becoming extremely useful for execution/performance analytics. Progress-trace can also be logged from within service code and they give and insight on how the entire transactions is performing.

NSO sessions lifespan

On NSO you have sessions and, for each session, you can have one or more transactions. Opening and closing them is not free and come with a cost, hence it might be more efficient to have long running transaction (rather than to start and stop user session multiple times as we do with high-level api such single_read_trans, etc.). However, transactions and sessions use resources and a prolonged use might lead, as an example, to an increment in memory usage.

This is sort-of trade-off, with which a developer must deal, between code complexity (e.g. a library that exposes an api to fetch some data from the cdb. Should it be done using single_read_trans or a transaction given in input or both?), prolonged use of resources (e.g. keep too many sessions open might lead to file description exhaustion, high memory usage, etc.), the high cost on opening/closing sessions/transactions.

The right approach depends on many factors. However, some key-points to keep in mind are: open a transaction towards the right DB (operational or running), choose the right operation (read or read/write), in general avoid to open/close a transaction multiple times in the same action callback.

Additionally, a valid usable transaction is always available inside an action. This transaction is read-only and can be used by adding it as an input parameter in the action callback:

Loop

@Action.action
def cb_action(self, uinfo, name, kp, input, output, transaction: ncs.maapi.Transaction):

NSO 6.x

With the new concurrency model introduced from NSO 6.x, read-write operations in the user code can overlap with other transactions. Thus, you should avoid any needless reads in your code. For a more detailed explanation and examples, you can check the "Avoiding Needless Reads" chapter from https://developer.cisco.com/docs/nso-guides-6.0/#!nso-concurrency-model/designing-for-concurrency

More best practices

Some more documented best-practices can also be found on: https://github.com/NSO-developer/nso-service-dev-practices

Best Practices and Guidelines on NSO with Python

XPATH for large configuration lists

Tools

Code Examples

Retrieve a single node

Retrieve a list of nodes

Monitoring

Bonus

Time consuming operations must be handled outside service logic

NSO 6.x

Create() Execution Time

Tracing Improvements

NSO sessions lifespan

NSO 6.x

More best practices