NSO Service Tuning Guide

alam.bilal · ‎06-19-2017

Is there a NSO tuning guide available online? I notice after cretaing a three hundred services, the system seems to slow down a lot during service creation. I check on the Openstack computes and all computes have enough resources. In fact, most of the compute are barely half used. The NSO Linux system have 32G with 32cores. When running at with

300 services, it's only using 3G of RAM. Are we constraining the NSO artifically?

[admin@sngpc-nfv-nso-1 ~]$ free -g

total used free shared buff/cache available

Mem: 31 3 11 0 16 26

Swap: 0 0 0

Also, I did notice the jvm is only has a Xmm of 64M and Xms of 16M. Is there any config file that I could change these settings?

root 9772 9435 0 14:08 ? 00:00:00 /opt/ncs/current/lib/ncs/lib/core/sls/priv/agentwrapper java -Xmx64M -Xms16M -Djava.security.egd=file:/dev/./urandom -jar /opt/ncs/current/lib/ncs/lib/core/sls/priv/webapp-runner.jar /opt/ncs/current/lib/ncs/lib/core/sls/priv/smartagent --port 0 --path /smartagent --shutdown-override

root 9773 9772 0 14:08 ? 00:00:17 java -Xmx64M -Xms16M -Djava.security.egd=file:/dev/./urandom -jar /opt/ncs/current/lib/ncs/lib/core/sls/priv/webapp-runner.jar /opt/ncs/current/lib/ncs/lib/core/sls/priv/smartagent --port 0 --path /smartagent --shutdown-override

root 10487 9435 1 14:30 ? 00:00:22 java -classpath :/opt/ncs/current/java/jar/* -Dport=4569 -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF-8 com.tailf.ncs.NcsJVMLauncher

[admin@sngpc-nfv-nso-1 ~]$ java -version openjdk version "1.8.0_65"

OpenJDK Runtime Environment (build 1.8.0_65-b17) OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)

admin@SNGPC# show ncs-state

ncs-state patches patches-directory /opt/ncs/current/lib/ncs/patches ncs-state version 4.2.3 ncs-state smp number-of-threads

alam.bilal · ‎06-19-2017

I'm not aware of any tuning guide as such.

Given that there is still free memory, I'm wondering if memory is the issue here.

> the system seems to slow down a lot during service creation.

Curious how this is being measured and some numbers to quantify the deterioration would be helpful.

I'd start off by looking at some crude timestamps:

Time spent in the service mapping logic. How that changes when the number of service instances grows.
Time spent for the writes to the device using device traces.
Perhaps use some CPU profiling to get more fine-grained view as to where the CPU cycles are being spent. Compare and contrast that with low/high number of service-instances.

Usually it is the access to the device that is the most expensive. If there is extensive processing/loops in the service mapping logic then (3) should be able to give more insights.

alam.bilal · ‎06-19-2017

Also check if you have some slow xpath:s in your service. Check the xpath log.

alam.bilal · ‎06-19-2017

> > the system seems to slow down a lot during service creation.

> Curious how this is being measured and some numbers to quantify the deterioration would be helpful.

the service code has an alarm. If the service doesn't get created within the specify period. The alarm will be displayed. Hence, it's easy to identify the slowness when u see a tons of alarm on the display while it doesn't happen when the number of service count are low.

alam.bilal · ‎06-19-2017

Generally, when I am analyzing performance of a Service instance creation/modification/etc. I try limiting the logging to a single service invocation and analyze logs by doing the following:

First try to determine ‘where’ the majority of the time the service instance creation is being spent:

1. Enable south-bound device tracing
2. tail –f all logs in the logs directory
3. initiate a service instance create
4. Analyze the devel.log log entries for transaction items like acquiring the lock, time required to calculate southbound diff, etc.
5. Analyze the ncs-java-vm.log (or service-specific python.log file) entries for anything misbehaving.
6. Analyze timing of associated southbound activity to device generated by the service instance.

Once you determine the general area where the delay is happening you can dive deeper.

For example, as Hakan suggested, if the southbound diff calculation is taking a long time – enable the xpath log and check for repeating xpath checks that are taking large amounts of time, etc…

If no smoking-guns are found, then you can proceed to analyzing server memory/cpu etc…

-Larry

alam.bilal · ‎06-19-2017

the service code has an alarm. If the service doesn't get created within

the specify period. The alarm will be displayed. Hence, it's easy to

identify the slowness when u see a tons of alarm on the display while it

doesn't happen when the number of service count are low.

In general, the recommendation is not to do processor intensive task from within the service create callback. It will be good to validate whether the service creation logic here depends (iterates) on the current number of service instances. Ideally, creation of each of the service instance should take same order of time.