06-10-2022 06:35 AM - edited 06-10-2022 06:37 AM
Our deployment consists of 2 nodes (HA config) and VIP running on top of it. We have few in-house developed packages running on this configuration. After inspecting audit.log, response times which I'm seeing there are really horrible.
<INFO> 10-Jun-2022::07:21:15.109 cyswy002n-csco-cw-nso-vm2 ncs[947716]: audit user: cisco/19953 RESTCONF: response with http: HTTP/1.1 /restconf/tailf/query 200 duration 11331677 ms <INFO> 10-Jun-2022::07:21:17.701 cyswy002n-csco-cw-nso-vm2 ncs[947716]: audit user: cisco/20000 RESTCONF: response with http: HTTP/1.1 /restconf/tailf/query 200 duration 18319 ms <INFO> 10-Jun-2022::07:21:26.545 cyswy002n-csco-cw-nso-vm2 ncs[947716]: audit user: cisco/19995 RESTCONF: response with http: HTTP/1.1 /restconf/tailf/query 200 duration 10999549 ms <INFO> 10-Jun-2022::07:21:32.901 cyswy002n-csco-cw-nso-vm2 ncs[947716]: audit user: cisco/20011 RESTCONF: response with http: HTTP/1.1 /restconf/tailf/query 200 duration 23960 ms
This NSO runs in virtual environment with 24 CPUs and plenty of free mem:
$ free --giga total used free shared buff/cache available Mem: 139 11 105 0 22 126 Swap: 1 0 1
Another indicator:
Linux 5.4.0-77-generic (cyswy002n-csco-cw-nso-vm2) 06/10/2022 _x86_64_ (24 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.80 0.00 0.31 0.01 0.00 97.88 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd dm-0 22.29 1.22 106.79 0.00 5311273 464985760 0 dm-1 0.00 0.00 0.00 0.00 3264 0 0 loop0 0.00 0.00 0.00 0.00 4 0 0 sda 11.50 1.22 106.79 0.00 5324982 464985761 0 scd0 0.00 0.00 0.00 0.00 2080 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 4.55 0.00 0.48 0.00 0.00 94.97 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd dm-0 9.00 0.00 36.00 0.00 0 72 0 dm-1 0.00 0.00 0.00 0.00 0 0 0 loop0 0.00 0.00 0.00 0.00 0 0 0 sda 9.00 0.00 36.00 0.00 0 72 0 scd0 0.00 0.00 0.00 0.00 0 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 2.63 0.00 0.27 0.02 0.00 97.08 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd dm-0 55.00 0.00 220.00 0.00 0 440 0 dm-1 0.00 0.00 0.00 0.00 0 0 0 loop0 0.00 0.00 0.00 0.00 0 0 0 sda 40.00 0.00 220.00 0.00 0 440 0 scd0 0.00 0.00 0.00 0.00 0 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 0.63 0.00 0.04 0.00 0.00 99.33 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd dm-0 0.00 0.00 0.00 0.00 0 0 0 dm-1 0.00 0.00 0.00 0.00 0 0 0 loop0 0.00 0.00 0.00 0.00 0 0 0 sda 0.00 0.00 0.00 0.00 0 0 0 scd0 0.00 0.00 0.00 0.00 0 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 0.52 0.00 0.15 0.00 0.00 99.33 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd dm-0 17.00 0.00 76.00 0.00 0 152 0 dm-1 0.00 0.00 0.00 0.00 0 0 0 loop0 0.00 0.00 0.00 0.00 0 0 0 sda 9.50 0.00 76.00 0.00 0 152 0 scd0 0.00 0.00 0.00 0.00 0 0 0
What I noticed is that write lock takes forever to be released from following output (not sure where to look to get better timing about when it is released):
admin@ncs# show ncs-state internal cdb datastore WRITE WAITING FOR WRITE READ LOCK SUBSCRIPTION REPLICATION TIME CLIENT SUBSCRIPTION CLIENT SUBSCRIPTION NAME TRANSACTION ID QUEUE FILENAME DISK SIZE RAM SIZE LOCKS SET LOCK SET SYNC PRIORITY REMAINING NAME IDS PRIORITY NAME IDS --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- running 1654-867681-961701@n2 - /var/opt/ncs/cdb/A.cdb 165.35 MiB 997.42 MiB 0 true - false - - operational 1654-867734-478856@n2 - /var/opt/ncs/cdb/O.cdb 63.40 MiB 138.29 MiB - - false - - - admin@ncs#
Where should I look for another performance indicators to figure out what's causing those horrible response times ?
06-10-2022 06:50 AM
Hello,
You can check devel.log and maybe enable progress-trace to see what’s exactly taking a long time to being achieved
06-10-2022 07:42 AM
Unfortunately I can't, this is a production system which is being used quite a lot. Any documentation which would help me to understand internals from operations perspective ?
06-10-2022 07:51 AM
You can enable progress trace in production. Just be careful about the level that you choose .
Otherwise you can try to do a ncs backup and load it in another server in a lab and try to debug.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide