CUCM - DRS backup fails on one of the subscribers

Vladimir Stankov · ‎09-14-2012

I have CUCM cluster 8.6.20000-2 with 4 nodes (publisher and subscriber located at one site and two subscriber located at a remote site, clustering over WAN depployment model). Currently the DRS backup fails on the one of the subscribers at the remote sites only:

Log File: 2012-08-31-18-48-31_b_ccm-s3_ucm_platform.log

=====================================================

Server : CCM-S3

Feature : UCM

Component : PLATFORM

Time Completed: 2012-08-31-18-56-26

Result Code : 1-Unknown PLATFORM Error

Result String : ERROR

=====================================================

null

----> BEGIN Standard Output

----> END Standard Output

In the drf log I can see the following messages:

2012-08-31 18:43:55,760 DEBUG [NetServerWorker] - drfNetServer.run: Received Client Socket request from /172.27.235.14:38496

2012-08-31 18:43:55,760 DEBUG [NetServerWorker] - Validating if client request is from a Node within the Cluster

2012-08-31 18:43:55,760 DEBUG [NetServerWorker] - Validated Client. IP = 172.27.235.14 Hostname = ccm-s3. Request is from a Node within the Cluster

2012-08-31 18:43:55,760 DEBUG [NetServerWorker] - drfNetServerWorker.drfNetServerWorker: Socket Object InpuputStream to be created

2012-08-31 18:43:56,059 DEBUG [NetServerWorker] - drfNetServerWorker.drfNetServerWorker: Socket Object InputStream connected

2012-08-31 18:43:56,060 INFO [NetMessageDispatch] - drfMessageReceiver::HandleMessage: Message ID100 has been validated successfully

2012-08-31 18:43:56,060 INFO [NetMessageDispatch] - drfMessageHandler:HandleDRFMessage: Client connected from host: 172.27.235.14

2012-08-31 18:44:05,310 WARN [NetServerWorker-172.27.235.14] - drfNetServerWorker.run: caught IOException, disconnecting client[172.27.235.14] - Connection reset

2012-08-31 18:44:05,310 INFO [NetMessageDispatch] - drfMessageReceiver::HandleMessage: Message ID200 has been validated successfully

2012-08-31 18:44:05,310 INFO [NetMessageDispatch] - drfMessageHandler:HandleDRFMessage: Client disconnect from host: 172.27.235.14. This may be due to Master or Local Agent being down.

The other 3 nodes are ok. Restart of the DRF Master and Local services doesn't help. Have anyone experienced similiar issue?

P.S. There is similiar problem in that thread: https://supportforums.cisco.com/message/3278977#3278977, where the issue was with the sub certificates, but in my case all certificates are ok.

Best Regards,

Vladimir

https://supportforums.cisco.com/message/3278977#3278977

Joseph Martini · ‎09-14-2012

Have you tried restarting the DRF Local and DRF Master agent services on the problem subscriber and trying a backup once more?

Vladimir Stankov · ‎09-16-2012

Hi Joe,

yes, I've tried that but no success.

Regards,

Vladimir

Joseph Martini · ‎09-17-2012

What is this IP address 172.27.235.14? Is it your SFTP server or a CUCM server?

Vladimir Stankov · ‎09-17-2012

172.27.235.14 is the subscribers that won't get backed up. I'll probably open a TAC case about this - no idea why this is happening yet.

Regards,

Vladimir

Akhil Behl · ‎09-18-2012

Try restarting DRF Local/Master services. If that still doesn’t resolve the issue, go to the Pub/Sub's OS Admin page to re-generate the IPSec certificate, and then restart the DRS service in Pub/Sub.

This needs to be done for all servers.

Akhil Behl
Senior Network Consultant
akbehl@cisco.com

Author of “Securing Cisco IP Telephony Networks”
www.ciscopress.com/title/1587142953

Try restarting DRF Local/Master services. If that still doesn’t resolve the issue, go to the Pub/Sub's OS Admin page to re-generate the IPSec certificate, restart the DRS service in Pub/Sub. This needs to be done for all servers.

Akhil Behl Solutions Architect akbehl@cisco.com Author of “Securing Cisco IP Telephony Networks” http://www.ciscopress.com/title/1587142953

Malik · ‎06-02-2017

Hi Vladmir,

Were you able to test this server afterwards? was it succesful and what was the cause? what was the work around suggested by TAC?

Vladimir Stankov · ‎06-08-2017

Hi Malik,

this was long time ago :) If I remember correctly the problem disappeared by itself, probably it was something on network level causing the DRS to fail. Didn't open a TAC at the end.

Vladimir Stankov · ‎11-09-2012

Just to update the status on this thread.

I haven't had a chance to debug additionaly the problem until today. Interestingly, whithout any changes, today the backup passed through on all of the servers. I'll be conducting additional tests in the following weeks to see whether the problem occurs again.

Regards,

Vladimir