cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
342
Views
0
Helpful
2
Replies

Could CDB compaction lead to failure in acquiring lock?

previousqna
Level 5
Level 5

Hi all,

 

We are facing this error where we are not able acquire the lock that we are requesting using cdb.startSession. With the logs below, I am seeing a pattern that its during compaction we are not able to acquire the lock.

 

I am aware that there were some issues fixed related to compaction in NCS-4.2. 

 

 

CDB: If a CDB client started a session using LOCK_WAIT during CDB

 

    compaction the client would hang forever, and prevent any further write

 

    transactions. (This could also happen to internal CDB clients, which

 

   would lock up the system).  This has been fixed.

 

 

    (Issue tracker: #21643)

 

 

Could compaction be the root-cause here?

 

 

 

<INFO> 2-Feb-2017::16:03:33.904 VTSmas-171 ncs[3314]: devel-cdb Initiating CDB journal compaction

 

<ERR> 2-Feb-2017::16:03:33.978 VTSmas-171 ncs[3314]: devel-c service_create error {application, "Exception in callback: Could not acquire lock"} for callpoint 'vts-router-interface-servicepoint' path /vts:cisco-vts/tenants/tenant{TA}/topologies/topology{TA}/interfaces/interface{f994a5fa-ab1b-4d54-9333-09485ac1d943}

 

<INFO> 2-Feb-2017::16:03:34.348 VTSmas-171 ncs[3314]: devel-cdb Compacted CDB journal file: 443 ms (14260 nodes in memory, disk size 2.54 MiB -> 1.65 MiB)

 

 

<INFO> 3-Feb-2017::03:58:44.528 VTSmas-171 ncs[3314]: devel-cdb Initiating CDB journal compaction

 

<ERR> 3-Feb-2017::03:58:44.699 VTSmas-171 ncs[3314]: devel-c service_create error {application, "Exception in callback: Could not acquire lock"} for callpoint 'vts-router-interface-servicepoint' path /vts:cisco-vts/tenants/tenant{TA}/topologies/topology{TA}/interfaces/interface{f994a5fa-ab1b-4d54-9333-09485ac1d943}

 

<INFO> 3-Feb-2017::03:58:45.047 VTSmas-171 ncs[3314]: devel-cdb Compacted CDB journal file: 518 ms (14215 nodes in memory, disk size 2.72 MiB -> 1.64 MiB)

 

 

affca-2b18-4041-ab23-28fe1818e8e8}

 

<DEBUG> 4-Feb-2017::03:34:26.486 VTSmas-171 ncs[11287]: devel-cdb connect from InterfaceCdbSock

 

<ERR> 4-Feb-2017::03:34:26.490 VTSmas-171 ncs[11287]: devel-c service_create error {application, "Exception in callback: Could not acquire lock"} for callpoint 'vts-router-interface-servicepoint' path /vts:cisco-vts/tenants/tenant{TA}/topologies/topology{TA}/interfaces/interface{85eaffca-2b18-4041-ab23-28fe1818e8e8}

 

<INFO> 4-Feb-2017::03:34:26.901 VTSmas-171 ncs[11287]: devel-cdb Compacted CDB journal file: 818 ms (14293 nodes in memory, disk size 3.03 MiB -> 1.66 MiB)

 

 

 

 

<INFO> 10-Feb-2017::05:26:41.715 VTS242-1 ncs[3302]: ncs send NED persist

 

<INFO> 10-Feb-2017::05:26:48.181 VTS242-1 ncs[3302]: devel-cdb Initiating CDB journal compaction

 

<ERR> 10-Feb-2017::05:26:48.280 VTS242-1 ncs[3302]: devel-c service_create error {application, "Exception in callback: Could not acquire lock"} for callpoint 'vts-router-interface-servicepoint' path /vts:cisco-vts/tenants/tenant{TA}/topologies/topology{TA}/interfaces/interface{f292e2f9-9cd1-4a58-a052-ab5c0ffe4a2d}

 

<INFO> 10-Feb-2017::05:26:48.518 VTS242-1 ncs[3302]: devel-cdb Compacted CDB journal file: 337 ms (16639 nodes in memory, disk size 3.45 MiB -> 1.95 MiB)

2 Replies 2

previousqna
Level 5
Level 5

Yes, the lock will fail if CDB journal compaction is taking place.

I could see same error when a new cdb session starts and in parallel I run the compaction manually.

I took a look at vtsRouterInterfaceRFS.java and seems it does not use CdbLockType.LOCK_WAIT.

You can work around this situation by using CdbLockType.LOCK_WAIT for cdb.startSession,

however due to the bug you indicated the client waiting for the lock would hang forever...

After the fix, LOCK_WAIT should work. I verified it with 4.2.2.

To more experts, please correct me if I'm wrong :-)

Thanks a lot for the response. Yes, We inserted LOCK_WAIT as a work around.

 

But, this  does not seem to be the perfect solution. It should be NCS who should let the client wait to acquire the lock.