Re: NSO performance (restconf)

dciprus · ‎06-10-2022

Our deployment consists of 2 nodes (HA config) and VIP running on top of it. We have few in-house developed packages running on this configuration. After inspecting audit.log, response times which I'm seeing there are really horrible.

<INFO> 10-Jun-2022::07:21:15.109 cyswy002n-csco-cw-nso-vm2 ncs[947716]: audit user: cisco/19953 RESTCONF: response with http: HTTP/1.1 /restconf/tailf/query 200 duration 11331677 ms
<INFO> 10-Jun-2022::07:21:17.701 cyswy002n-csco-cw-nso-vm2 ncs[947716]: audit user: cisco/20000 RESTCONF: response with http: HTTP/1.1 /restconf/tailf/query 200 duration 18319 ms
<INFO> 10-Jun-2022::07:21:26.545 cyswy002n-csco-cw-nso-vm2 ncs[947716]: audit user: cisco/19995 RESTCONF: response with http: HTTP/1.1 /restconf/tailf/query 200 duration 10999549 ms
<INFO> 10-Jun-2022::07:21:32.901 cyswy002n-csco-cw-nso-vm2 ncs[947716]: audit user: cisco/20011 RESTCONF: response with http: HTTP/1.1 /restconf/tailf/query 200 duration 23960 ms

This NSO runs in virtual environment with 24 CPUs and plenty of free mem:

 $ free --giga
              total        used        free      shared  buff/cache   available
Mem:            139          11         105           0          22         126
Swap:             1           0           1

Another indicator:

Linux 5.4.0-77-generic (cyswy002n-csco-cw-nso-vm2)      06/10/2022      _x86_64_        (24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.80    0.00    0.31    0.01    0.00   97.88

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
dm-0             22.29         1.22       106.79         0.00    5311273  464985760          0
dm-1              0.00         0.00         0.00         0.00       3264          0          0
loop0             0.00         0.00         0.00         0.00          4          0          0
sda              11.50         1.22       106.79         0.00    5324982  464985761          0
scd0              0.00         0.00         0.00         0.00       2080          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.55    0.00    0.48    0.00    0.00   94.97

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
dm-0              9.00         0.00        36.00         0.00          0         72          0
dm-1              0.00         0.00         0.00         0.00          0          0          0
loop0             0.00         0.00         0.00         0.00          0          0          0
sda               9.00         0.00        36.00         0.00          0         72          0
scd0              0.00         0.00         0.00         0.00          0          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.63    0.00    0.27    0.02    0.00   97.08

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
dm-0             55.00         0.00       220.00         0.00          0        440          0
dm-1              0.00         0.00         0.00         0.00          0          0          0
loop0             0.00         0.00         0.00         0.00          0          0          0
sda              40.00         0.00       220.00         0.00          0        440          0
scd0              0.00         0.00         0.00         0.00          0          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.63    0.00    0.04    0.00    0.00   99.33

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
dm-0              0.00         0.00         0.00         0.00          0          0          0
dm-1              0.00         0.00         0.00         0.00          0          0          0
loop0             0.00         0.00         0.00         0.00          0          0          0
sda               0.00         0.00         0.00         0.00          0          0          0
scd0              0.00         0.00         0.00         0.00          0          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.52    0.00    0.15    0.00    0.00   99.33

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
dm-0             17.00         0.00        76.00         0.00          0        152          0
dm-1              0.00         0.00         0.00         0.00          0          0          0
loop0             0.00         0.00         0.00         0.00          0          0          0
sda               9.50         0.00        76.00         0.00          0        152          0
scd0              0.00         0.00         0.00         0.00          0          0          0

What I noticed is that write lock takes forever to be released from following output (not sure where to look to get better timing about when it is released):

admin@ncs# show ncs-state internal cdb datastore
                                                                                                  WRITE                WAITING FOR                                                                             
                                    WRITE                                                  READ   LOCK   SUBSCRIPTION  REPLICATION            TIME       CLIENT  SUBSCRIPTION            CLIENT  SUBSCRIPTION  
NAME         TRANSACTION ID         QUEUE  FILENAME                DISK SIZE   RAM SIZE    LOCKS  SET    LOCK SET      SYNC         PRIORITY  REMAINING  NAME    IDS           PRIORITY  NAME    IDS           
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
running      1654-867681-961701@n2  -      /var/opt/ncs/cdb/A.cdb  165.35 MiB  997.42 MiB  0      true   -             false        -         -                                                                
operational  1654-867734-478856@n2  -      /var/opt/ncs/cdb/O.cdb  63.40 MiB   138.29 MiB  -      -      false         -            -         -                                                                

admin@ncs#

Where should I look for another performance indicators to figure out what's causing those horrible response times ?

Nabsch · ‎06-10-2022

Hello,

You can check devel.log and maybe enable progress-trace to see what’s exactly taking a long time to being achieved

dciprus · ‎06-10-2022

Unfortunately I can't, this is a production system which is being used quite a lot. Any documentation which would help me to understand internals from operations perspective ?

Nabsch · ‎06-10-2022

You can enable progress trace in production. Just be careful about the level that you choose .

Link to doc

Otherwise you can try to do a ncs backup and load it in another server in a lab and try to debug.