CUCM - LDAPS - Marking too many inactive

Clutz5250 · ‎02-18-2022

I'm currently information gathering on an LDAPS issue and I'm hoping maybe I could get some advice here before I jump to TAC. The problem is that after a LDAPS sync (which does complete), way too many accounts are getting marked inactive. The problem will eventually go away after manually syncing x amount of times and finally getting a sane amount of marked inactives. We are manually syncing until we can 'trust' it won't do this anymore. Our LDAPS solution goes from CUCM to Akkadian Contact Manager, where there are further, multiple LDAPS connections to external organizations (so a proxy). LDAPS authentication is of course through this as well. It was also told to me, that LDAPS apparently has a history of problems in our environment (FYI we have 11.5 still), even before our Akkadian solution it sounds like. So I think I can rule out issues with Akkadian.

I have taken packet captures during this time and noticed there are multiple bursts of traffic being made to Akkadian during synchronization, and some manner of replication is occurring thereafter (looking at TCP flows). IM&P showed fullsync start flag and end flag (it seemed) in the conversation with the publisher. However subscribers showed nothing conclusive toward the end of their TCP flows. It's possible i cut the capture too early to see those finish replicating. Outside of this, there were some RST flags that I don't know if are normal or not - at the end of Akkadian communication. Replication from the publisher seems to continue after that point. Ah, and one more thing: in the midst of the capture I also noticed a syslog message that snuck by (some info redacted):

%UC_JAVAAPPLICATIONS-3-UserSyncFail: %[Reason=DiscoveryUserIdentity entered for EndUser already exists.][Userid=REDACTED.REDACTED@REDACTED.REDACTED for AgreementId REDACTED][AppID=Cisco DirSync][ClusterID=][NodeID=REDACTED]: User is not Synced Default

There is nothing else I could really spot in the captures. Overall I'm having hard time gleaning a smoking gun from pcaps.

I cross referenced the bug tool and could not find anything necessarily matching my symptoms exactly. I did find: CSCvo92530

"This bug is to ensure that the 1st TCP session CUCM established gets CLOSED after it receives all of the records from LDAP, and does not remain open throughout the duration of the 2nd TCP session's sync."

So I checked our DirSync settings and found:

Clusterwide Parameters
Maximum Number Of Agreements Required Field 20
Maximum Number Of Hosts Required Field 3
Retry Delay On Host Failure (secs) Required Field 5
Retry Delay On HostList Failure (mins) Required Field 10
LDAP Connection Timeout (secs) Required Field 30
Delayed Sync Start time (mins) Required Field 5

Suggested values are suppose to be:
20
3

5

10

5

5

I'm either wondering if i should increase it significantly more (120 to 180 secs) or put it back to default.

Has anyone else experienced this problem? Any advice? Thanks!