04-12-2013 05:48 AM - edited 03-18-2019 12:55 AM
The following error message appears after some time in a cluster with only two VCS Controls:
Failed - This peer either has a different list of peers or it has a different software version installed
for the remote peer.
A restart of the slave solves the issue.
Has anyone come across this ?
04-12-2013 05:53 AM
That's a bit odd. What version of software are you running? Also what is the round trip delay between each peer, and every other peer?
Thanks,
Guy
04-12-2013 05:53 AM
are they geographically apart? the roundtrip delay is 30 ms, above that it would fail.
do you have the same NTP server configured on both of them?
are they running the same software?
are they on the sam VLAN? are they on the same subnet? if not, have you turned off SPI on your Router/firewall, if you have any in between?
basically you need to provide more info, there is a lot questions need to be answered.
04-12-2013 06:06 AM
Hello Ahmad,
Yes they are geographically apart - the RTT is around 30 ms
15 packets transmitted, 15 received, 0% packet loss, time 14016ms
rtt min/avg/max/mdev = 33.326/33.755/35.132/0.476 ms
The NTP server is not the same, but I will change that, if required
Yes, they are running the same sw
The firewalls are Cisco ASA and inspect is disabled
Another thing is the following:
As an addition to all the ports listed in App. 3 of the clustering guide for the VCS, I have come across an issue with communication between the cluster peers on high ports > 40000.
They seem to be opening them dynamically between each other.
Port pairs I can distinguish on the ASA are for example:
45600 / 43173
42981 / 46620
48399 / 48424
41386 / 43234
44599 / 45478
46122 / 47370
49243 / 40433
After two restarts, I had to open port 46854, and the replication finally occured.
The major setbacks are:
1. What it will happen when the master or some of the peers get restarted or rebooted ?
2. How come there is no configuration documentation on these ports ?
Cheers
04-12-2013 06:14 AM
did you use ping ipaddressOfOtherPeer ? if yes, then use this: ping -l 4000 ipaddressOfOtherPeer ? this tests for 4000 large packets which is more align with replication packet size between VCS peers. if the round trip delay is above 30 ms, then you need to contact your network admin to fix that, otherwise we continue with investigation.
regards, Ahmad
04-12-2013 06:29 AM
Hello Ahmad,
I tested with 4000 (+28 the size of the ping packet header) and here is the result:
19 packets transmitted, 19 received, 0% packet loss, time 18023ms
rtt min/avg/max/mdev = 34.262/34.372/34.699/0.265 ms
It is true slightly above 30 ms, but constant at 34 ms.
Is the VCS that sensitive to the trip time ?
And why it does not fix the replication after it returns the error, if it is capable of doing it after a restart ?
04-12-2013 06:39 AM
30ms is an absolute maximum, 34ms is over 110% of that.
The cluster need to be sorted out so even at peek times it's within 30ms.
It can recover, but it takes time, and if it keeps drifting outside 30ms it wont have chance to.
Thanks,
Guy
04-12-2013 06:43 AM
For clustering must be below 30 ms (hard set).
VCS (H323) is synchronizing via NTP and very sensitive to timing.
04-12-2013 07:01 AM
But then, how come the other cluster (between the VCS Expressways) that also has RTT of about 34 ms average works fine ?
30 packets transmitted, 30 received, 0% packet loss, time 29032ms
rtt min/avg/max/mdev = 34.237/34.582/37.126/0.690 ms
04-12-2013 07:54 AM
I would be doutful that they would take the same route. did you also trace route between VCSe's? are they taking the same route as VCSC?
are VCSe peers also geographically apart? no backend connection for repliation?
04-16-2013 04:35 AM
It appears so, that the clustering communication between the peers of a cluster uses far more ports than the documented ones in the clustering setup guide (H.323, IPsec, IKE). The issue was finally resolved after tracking the drops on the ASA, finding out a bunch of high random numbered ports and finally opening ip any any for the clustering peers.
Cheers,
M.
04-16-2013 04:39 AM
This is from the deployment document to turn off ALG/SPI/any other packet inspection:
it is highly recommended to disable SIP and H.323 ALGs on routers/firewalls carrying network traffic to or from a VCS Expressway, as, when enabled this is frequently found to negatively affect the built-in firewall/NAT traversal functionality of the
Expressway itself.
04-22-2013 03:54 AM
Hello,
I have the same issue but only with ESP (SPI). What ports are needed to remove from ESP(SPI) or protocol inspection? I mean:
- SIP: 5060
- H323: 1719 y 1720
- SSH: 22
- ISAKMP: 500
Any more? The high ports from 40000 to 49000 too? And media ports of SIP from 20000 to 29999? And H.323 Media ports too?
Thanks in advance.
Best regards.
04-22-2013 11:22 PM
Hi,
Due to the dinamism of it all I resorted to opening gt 1024 between the VCS Cluster member IPs.
Best regards
04-23-2013 12:33 AM
Hi,
Every ports from 1024 to 65535 between both VCSe, TCP and UDP protocols?
Thanks in advance.
Best regards.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide