cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

490
Views
0
Helpful
4
Replies
mario.jost
Beginner

Python is very slow on Nexus

We do have a couple of Nexus C93180YC-FX with nxos.9.3.4.bin running. I do have a lot of TCL scripts that run on IOS and IOS-XE, so i do have some experience on how long I have to wait for command X or Y. I was recently trying python, but everything is really slow. This is my script:

from cli import *
import sys, time

start = time.time()
print (str(round(time.time() - start, 2)) + ' Script started')

def geterr(intname):
	print (str(round(time.time() - start, 2)) + ' start function, get the CLI command')
	cmd = cli('show interface ' + intname + ' counters errors | inc Eth')
	print (str(round(time.time() - start, 2)) + ' get Int desc')
	intdesc = cli('show interface ' + intname + ' description | include Eth').split()[3]
	print (str(round(time.time() - start, 2)) + ' finished function')
	print ' '
	
geterr('Ethernet1/1')
geterr('Ethernet1/3')
geterr('Ethernet1/5')
geterr('Ethernet1/7')
print (str(round(time.time() - start, 2)) + ' Script finished')

Aso you can see, i just loop thru 4 interface, get the interface counter errors and the corresponding description. I added some timestamps to see how long the script takes and how long the intermediate steps take. This is the output:

0.0 Script started
0.0 start function, get the CLI command 0.69 get Int desc 1.39 finished function 1.39 start function, get the CLI command 2.08 get Int desc 2.78 finished function 2.78 start function, get the CLI command 3.49 get Int desc 4.21 finished function 4.21 start function, get the CLI command 4.9 get Int desc 5.6 finished function 5.6 Script finished

So every command take around 0.7 seconds to finish, resulting in 5.6 seconds for the script in total. If i use the json scheme, it is a bit faster. This is the script:

from cli import *
import sys, time

start = time.time()
print (str(round(time.time() - start, 2)) + ' Script started')

def geterr(intname):
	cmd = 'show interface ' + intname + ' counters errors'
	out = json.loads(clid(cmd))
	# print (json.dumps(out, sort_keys=False, indent=4))
	port = str(out['TABLE_interface']['ROW_interface'][0]['interface'])
	err01 = int(out['TABLE_interface']['ROW_interface'][0]['eth_align_err'])
	err02 = int(out['TABLE_interface']['ROW_interface'][0]['eth_fcs_err'])
	err03 = int(out['TABLE_interface']['ROW_interface'][0]['eth_outdisc'])
	err04 = int(out['TABLE_interface']['ROW_interface'][0]['eth_rcv_err'])
	err05 = int(out['TABLE_interface']['ROW_interface'][0]['eth_undersize'])
	err06 = int(out['TABLE_interface']['ROW_interface'][0]['eth_xmit_err'])
	
	err07 = int(out['TABLE_interface']['ROW_interface'][1]['eth_carri_sen'])
	err08 = int(out['TABLE_interface']['ROW_interface'][1]['eth_excess_col'])
	err09 = int(out['TABLE_interface']['ROW_interface'][1]['eth_late_col'])
	err10 = int(out['TABLE_interface']['ROW_interface'][1]['eth_multi_col'])
	err11 = int(out['TABLE_interface']['ROW_interface'][1]['eth_runts'])
	err12 = int(out['TABLE_interface']['ROW_interface'][1]['eth_single_col'])
	
	err13 = int(out['TABLE_interface']['ROW_interface'][2]['eth_deferred_tx'])
	err14 = int(out['TABLE_interface']['ROW_interface'][2]['eth_giants'])
	err15 = int(out['TABLE_interface']['ROW_interface'][2]['eth_inmacrx_err'])
	err16 = int(out['TABLE_interface']['ROW_interface'][2]['eth_inmactx_err'])
	err17 = int(out['TABLE_interface']['ROW_interface'][2]['eth_symbol_err'])
	
	err18 = int(out['TABLE_interface']['ROW_interface'][3]['eth_indisc'])
	
	intdesc = cli('show interface ' + intname + ' description | include Eth').split()[3]
	print '%11s  %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12d %12s' % (port, err01, err02, err06, err04, err05, err03, err12, err10, err09, err08, err07, err11, err14, err13, err16, err15, err17, err18, intdesc)

print 'Port         %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s %12s' % ('Align-Err','FCS-Err','Xmit-Err','Rcv-Err','UnderSize','OutDiscards','Single-Col','Multi-Col','Late-Col','Exces-Col','Carri-Sen','Runts','Giants','Deferred-Tx','IntMacTx-Er','IntMacRx-Er','Symbol-Err','InDiscards','Hostname')
print '==================================================================================================================================================================================================================================================================='
geterr('Ethernet1/1')
geterr('Ethernet1/3')
geterr('Ethernet1/5')
geterr('Ethernet1/7')
print '==================================================================================================================================================================================================================================================================='
print (str(round(time.time() - start, 2)) + ' Script finished')

This takes around 5.46 seconds to finish. In TCL i can tell you, this would've not even taken 1 second to finish. So why is python so slow on Nexus and can we do something to speed it up?

 

4 REPLIES 4
Christopher Hart
Cisco Employee

Hi Mario!

I performed some similar testing in my lab and was able to reproduce similar latency. To summarize, the CLI Python library that NX-OS uses introduces some additional overhead when executing a command that is not present when the same command is executed through a TCL script or through the CLI.

First, let's baseline about how long it takes to execute the command show interface Ethernet1/1 counters errors through the CLI. We can do this relatively well by prepending and appending the show clock command on either side. This isn't perfect (it doesn't account for the overhead of the show clock command itself) but it is close, as shown below.

N9K# show clock ; show interface Ethernet1/1 counters errors ; show clock
14:31:48.555 UTC Thu Aug 20 2020
Time source is NTP

--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth1/1 0 0 0 0 0 0

--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Eth1/1 0 0 0 0 0 0

--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth1/1 0 -- 0 0 0 0

--------------------------------------------------------------------------------
Port InDiscards
--------------------------------------------------------------------------------
Eth1/1 0
14:31:48.639 UTC Thu Aug 20 2020
Time source is NTP

All three commands took about 84 milliseconds to execute. Through my own testing (I can expand on this if you'd like, but I don't want to crowd this post with too much unrelated detail), I determined that the show clock command takes about 10 milliseconds to execute, so that means the show interface Ethernet1/1 counters errors command takes about 64 milliseconds to execute through the CLI.

Next, let's inspect the code for the cli Python function that you're using. We can find the location of the file with the find_module() function from the imp module in the standard library.

N9K# python

Warning: Python 2.7 is End of Support, and future NXOS software will deprecate
python 2.7 support. It is recommended for new scripts to use 'python3' instead.
Type "python3" to use the new shell.

Python 2.7.11 (default, Jun 4 2020, 09:48:24)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import imp
>>> imp.find_module("cli")
(<open file '/isan/python/scripts/cli.py', mode 'U' at 0xf39deee8>, '/isan/python/scripts/cli.py', ('.py', 'U', 1))

 We can use the Bash shell to view the contents of this file as of NX-OS 9.3(5). This module contains more functions than just the cli function, but I've purposefully cut it down to just the cli function we're using.

def cli(cmd):
    '''
    Execute CLI commands. Takes CLI command string and returns show command output in a plain string form.

    Arguments:
        cmd: Single CLI command or a Batch of CLI commands. Delimiter for mutlple CLI
             commands is space + semi-colon. Configuration commands need to be in a 
             fully qualified form. 
             For example, configure terminal ; interface ethernet1/1 ; no shutdown
    
    Returns:
        string: CLI output string for show commands and an empty string for configuration
                commands.

    Raises:
        cli_syntax_error: CLI command is not a valid NXOS command.
        cmd_exec_error: Execution of CLI command is not successful. 

    '''
    args = ["/isan/bin/vsh", "-N", "-c", cmd]
    p = subprocess.Popen(args, env=getenv(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    output,error = p.communicate()
    output = convToStr(output)
    error = convToStr(error)
    if p.returncode == 0:
        return output
    else:
        msg = "{0}{1}".format(output, error)
        if p.returncode == 16:
            raise cli_syntax_error(msg)
        else:
            raise cmd_exec_error(msg)

This function is relatively straightforward - it executes a given command by passing it into a vsh binary at /isan/bin/vsh as a parameter. This binary is then executed by opening up a subprocess, the results of which are fetched with the communicate() function.

This additional overhead is introducing the latency you're observing in your Python script. We can verify this with the timeit module from the standard library through the below Python statements:

 

N9K# python

Warning: Python 2.7 is End of Support, and future NXOS software will deprecate
python 2.7 support. It is recommended for new scripts to use 'python3' instead.
Type "python3" to use the new shell. 

Python 2.7.11 (default, Jun  4 2020, 09:48:24) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.timeit('output, error = subprocess.Popen(["/isan/bin/vsh", "-N", "-c", "show interface Ethernet1/1 counters errors"], stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()', "import subprocess", number=10)
6.059631109237671
>>> 6.059631109237671 / 10
0.6059631109237671

This shows that this specific one-liner (which is essentially what the cli function does) takes about 606 milliseconds to execute on average. This timing corresponds with the latency you're observing in your script.

There is no supported way to speed up the CLI library you're importing. Theoretically, you could write your own library that performs a similar task without opening a subprocess - however, I would theorize that there's a very good reason the developers chose to open a subprocess to execute this command instead of executing it natively in the shell, and you might discover that reason if you choose to refactor this library on your own!

With that being said, let's talk about the use case behind some of your Python scripts. Specifically, I have two questions:

  1. Are your Python scripts truly latency-sensitive and need to be completed as quickly as possible?
  2. Do your Python scripts need to be executed on the Nexus switch itself, or can they be executed off-the-box?

I hope this helps - thank you!

-Christopher

 

Sergiu.Daniluk
VIP Engager

hi @mario.jost 

The speed of the script is mainly influenced by the cli() method since it takes the output of the cmd and not the values from system itself. clid() on the other hand, becuase it takes only the values from the system software, is much faster and better becuase data is also structured. You can also improve the speed of the script: instead of running it multiple times, for each interface you are interested, it;s better to run it once, and parse the dictionary for the interested interfaces.

here is one try from my side to speed up a bit the process:

from cli import *
import sys, time
start = time.time()
print (str(round(time.time() - start, 2)) + ' Script started')
dict_interface={}
table = json.loads(clid('show interface counters errors'))
for interface in table['TABLE_interface']['ROW_interface']:
    if interface['interface'] not in dict_interface:
        dict_interface[interface['interface']] = {}
    dict_interface[interface['interface']].update(interface)

print(' '.join(dict_interface['Ethernet1/1'].keys()))
print(' '.join(dict_interface['Ethernet1/1'].values()))
print(' '.join(dict_interface['Ethernet1/2'].values()))
print(' '.join(dict_interface['Ethernet1/3'].values()))
print (str(round(time.time() - start, 2)) + ' Script finished')

!Result:
>>> print (str(round(time.time() - start, 2)) + ' Script started')
0.02 Script started
>>> print(' '.join(dict_interface['Ethernet1/1'].values()))
0 0 0 0 0 0 0 0 0 0 0 0 Ethernet1/1 0 0 0 0 0 0
>>> print(' '.join(dict_interface['Ethernet1/2'].values()))
0 0 0 0 0 0 0 0 0 0 0 0 Ethernet1/2 0 0 0 0 0 0
>>> print(' '.join(dict_interface['Ethernet1/3'].values()))
0 0 0 0 0 0 0 0 0 0 0 0 Ethernet1/3 0 0 0 0 0 0
>>> print (str(round(time.time() - start, 2)) + ' Script finished')
1.45 Script finished

Note: I run it intro interactive mode, but I believe it is much faster running it from a file.

 

Stay safe,

Sergiu

Thank you very much for your responses. So the main thing i can take from this is: CLI commands are slow and there is nothing i can do about it to speed them up. Try to get as many information from a single command as possible and then build the data from the one request. 

 

I just found out that TCL is still working on Nexus switches. Some sales personell told me that TCL does not work anymore on NX-OS. So this is nice to have as a backup in case something bigger needs to run quick. So I was looking into the possibilites of running the commands from an external source, like a Linux machine via the API. Sadly, there are alot of different possibilites and im not sure with which to go for. There is NX-API, Netconf, Restconf, Yang and Openconfig. So i will have to decide for one and build the scripts on one server. I have to get used to running the scripts centrally instead of logging onto the device and run them locally. Thanks for the quick responses and input from your side. Im sure this is usefull for anybody else who wants to migrate TCL to Python on Nexus devices.

That is true, running the scripts on the box is not the most scalable option, although running the script in a central point using APIs, will definitely add some time overhead, regardless which option you use.

IMHO NX-API is the most easier approach: lightweight, faster (if you use http port), data structured as JSON (and you can use the NXOS CLI commands: "show command | json  or show command | json-pretty and you will already have the returned data).

Regarding TCL on NXOS, I never tested it so I do not even know if it works or not, but it's worth trying. This might be helpful in a multi-platform environment where TCL works on all platforms, and you need to test smtg on all devices.

 

Stay safe,

Sergiu

Content for Community-Ad