Re: CDB client connection number increasing, how to clear up

Chris Wang · ‎02-14-2019

checked with command "/etc/init.d/ncs status" to find out that the cdb clients increasing these days, which will case NSO does not response. below is the output of the status of CDB part:

cdb:
cluster mode: master (synchronous replication)
current transaction id: 1550-80281-219216@aptx1nso365.webex.com
running:
filename: /var/opt/ncs/cdb/A.cdb
disk size: 155.3216 MB
ram size: 568.4042 MB
read locks: 0
write lock: unset
operational:
cluster mode: master
current transaction id: 0
filename: /var/opt/ncs/cdb/O.cdb
disk size: 1.10046 MB
ram size: 6.10082 MB
subscription lock: unset
no pending subscription notifications
registered cdb clients:
client name: Cdb-ResourceManaged-1623-pool:404
type: client
db: operational
subscription-lock: false
client name: Cdb-ResourceManaged-1610-pool:403
type: client
db: operational
subscription-lock: false
client name: Cdb-ResourceManaged-1605-pool:402
type: client
db: operational
subscription-lock: false
client name: Cdb-ResourceManaged-1590-pool:401
type: client
db: operational
subscription-lock: false

How to clear the client connections? And how to find out why the clients numbers keep increasing?

vleijon · ‎02-14-2019

That the sequence number increases is normal, but if the number of active connection increases without bound you have more of a problem.

If it grows rapdily, it depends on who is responsible for the connection, that software needs to make sure to close their sessions when they are done with them. In this case you list four active connections from the resource manager - which is quite normal.

For development purposes, a package reload will reset most of these for you.

Chris Wang · ‎02-14-2019

Hi,

I reload the packages by using "packages reload", and checked the connection, it does not reset.
[root@aptx1nso365 ~]# /etc/init.d/ncs status | grep -i "client name" | wc -l
471

vleijon · ‎02-14-2019

Okay, so then it is being immediately re-created. Do you actually have over 400 allocation pools that are being monitored or what are the client names that make up the bulk of that list?

Chris Wang · ‎02-14-2019

In fact, we do not have so many pools, that's an issue. We're tracking it. And when transaction increasing, the NSO will hang, we need to find a way to clear those no needed transactions. And need to know how to track this issue

vleijon · ‎02-15-2019

A few different things. First of all, are you sure the slow down is because of the connections and? Generally, oper-data connections like these are pretty cheap.

Regarding tracking it, ncs --status (which is what hides behind the command you run) is pretty good for seeing the current state, to see things happening (in particular for configuration sessions) devel.log is good. For a system install you might have to turn that on explicitly in ncs.conf.

If all the connections (or at least, several hundred of them) are from the ResourceManager you might want to file a ticket to get an explanation.

Chris Wang · ‎02-20-2019

We find that we have daily sync-from job which is calling api/running/devices/device/%s/_operations/sync-from, it causes the the connections increase sharply.

Below is the snapshot:
[root@apsj1nso001 ~]# /etc/init.d/ncs status | grep -i "client name" | wc -l
568
[root@apsj1nso001 ~]# curl -X POST -u apiuser:90V1rtua1 http://localhost:8080/api/running/devices/device/ORD10-WXBB-PE01/_operations/sync-from
<output xmlns='http://tail-f.com/ns/ncs'>
<result>true</result>
</output>
[root@apsj1nso001 ~]# /etc/init.d/ncs status | grep -i "client name" | wc -l
570

vleijon · ‎02-21-2019

That seems more likely. So, what does the session look like in the status output? Doing a lot of synch-from at the same time will slow your system down of course, does it cause you a problem once the synchronization is done?

Chris Wang · ‎02-21-2019

after we sync-from all the devices we managed, the session connection increase sharply, and could not be released.
[root@apsj1nso365 ~]# /etc/init.d/ncs status | grep -i "client name" | wc -l
898
And I checked all the connections, they are all from Cdb-ResourceManaged
client name: Cdb-ResourceManaged-3676-pool:908
client name: Cdb-ResourceManaged-3425-pool:907
client name: Cdb-ResourceManaged-3428-pool:906
client name: Cdb-ResourceManaged-3427-pool:905
client name: Cdb-ResourceManaged-3426-pool:904
client name: Cdb-ResourceManaged-3424-pool:903
client name: Cdb-ResourceManaged-3429-pool:902
client name: Cdb-ResourceManaged-3423-pool:901
client name: Cdb-ResourceManaged-3405-pool:900

ramkraja · ‎02-22-2019

Hello,

I think these sessions are from a NED, and not from the 'resource manager' package.

There were similar issues of file descriptor leaks in the cisco-ios NED, for instance, and it was fixed in version 6.0.13 (that was already 6 months ago, so I'm not sure if you are still using the broken version).

What are the versions of the NEDs you are using?

Chris Wang · ‎02-22-2019

yeah, the package we used are as belows:
cisco-ios pakcage version is 5.9.3
cisco-asa version is 6.0.4
cisco-fmc version is 1.0.4
cisco-iosxr version is 6.6.2.1
cisco-nx version is 5.6
citrix-netscaler version is 3.0.23

ramkraja · ‎02-25-2019

Hi,

There is this note in the CHANGES for cisco-ios version 6.0.13:

  - Properly clean up NSO resources when closing NED.

I have seen a couple of cases earlier, where the file descriptor usage was increasing forever, with versions of cisco-ios NED prior to this fix. It is especially worse if you have a lot of devices.

Try with a NED version later than 6.0.13, and see if that solves the problem.

/Ram

Chris Wang · ‎03-25-2019

I have update the cisco-ios version 6.3, still take no effect. below is the detail information

[root@apsj1nso001 ~]# /etc/init.d/ncs status | grep -i "client name" | wc -l
2749

ncsadmin@ncs# show packages package package-version
PACKAGE NAME VERSION
--------------------------------
acl106 1.3
bblinkservice 1.2.1
cisco-asa 6.0.4
cisco-fmc 1.0.4
cisco-ios 6.3
cisco-iosxr 6.6.2.1
cisco-nx 5.6
citrix-netscaler 3.0.23
serviceseconciliation 1.3.1
snmp-notif-recv 1.0
tailf-hcc 4.3.2
upservice 1.6

CDB client connection number increasing, how to clear up & find out root cause