Re: Internal DB replication Problem

dnettsw · ‎11-18-2010

Hi There,

I have one ACS SE 1113 and another SE1112. Both devices are running 4.2.0.124. I setup both devices and had replication working properly local just yesterday. The 1113 was on one subnet and 1112 on another. It worked perfect. I moved the 1112 to our remote COLO site for DR purposes. Since then I am getting the following error on the primary:

11/18/2010	11:24:59	atpacs	INFO	Outbound replication cycle completed
11/18/2010	11:24:59	atpacs	WARNING	Cannot replicate to 'eqxacs' - server not responding
11/18/2010	11:19:57	atpacs	INFO	Outbound replication cycle starting...

Our environment is setup so that we have a VPN tunnel between the site. I am able to login into the device, but the DB can't replicate.

The FWs are Junipers ISG 1000. Like I said it's just a simple tunnel to and from. I see the traffic from the primary SE headed to the secondary, but it never comes back. Any ideas?

Thanks!

dnettsw · ‎11-18-2010

Also in the AUTH.Log file I am getting the following error:

AUTH 11/18/2010 11:01:31 E 1102 1884 0x0 Comms lib:Failed to get SERVICE_NEGOTIATED message during connect phase, rc = 10060
AUTH 11/18/2010 11:01:31 E 0371 1884 0x0 DBReplicate(OUT) cannot sync with ACS eqxacs - server not responding
AUTH 11/18/2010 11:01:31 E 0469 1884 0x0 Database synchronization with host eqxacs failed - refer to CSAuth log file
AUTH 11/18/2010 11:01:31 A 2039 1884 0x0 DBReplicate(OUT) cycle completed

Nicolas Darchis · ‎11-18-2010

Hi,

did you open up the replication port (2000) in your firewall ? Can you see the replication traffic arriving on the 2nd ACS ?

Nicolas

===

Don't forget to rate answers that you find useful

dnettsw · ‎11-18-2010

Its just a straight VPN tunnel permitting ANY to and from.

Nicolas Darchis · ‎11-18-2010

Ok, but it would still be interesting to take sniffer traces to see if the traffic is arriving on the other side and reaching the other ACS or not.

Nicolas

dnettsw · ‎11-19-2010

I was able to do traces on the ISG 1000 using "debug flow drop" sourcing on the ACS Ip. It looks like one of our Juniper here (locally) is dropping packets.

**** jump to packet:172.16.17.14->172.16.111.14
flow_decap_vector IPv4 process
flow packet already have session.
flow session id 512345
flow_main_body_vector in ifp ethernet1/4 out ifp ethernet1/3
flow vector index 0x3bb, vector addr 0x1871f53c, orig vector 0x1871f53c
vsd 0 is active
av/uf/voip checking.
post addr xlation: 172.16.17.14->172.16.111.14.
update policy out counter info.
packet send out to 00d00410c000 through ethernet1/3
**** pak processing end.
packet dropped, ASP tcp state error
POLL_DROP_PAK: vlist 0x1871f53c, 0x1871f550

Right I am not sure of the cause.

dnettsw · ‎11-19-2010

Sorry thats on the return trip. 17.14>111.14.

Nicolas Darchis · ‎11-19-2010

We told you it was the pesky Juniper firewall 😛

I'm afraid we can't help you on those logs though.

Nicolas

===

Don't forget to rate answers that you find useful

Tiago Antunes · ‎11-19-2010

Hi please double-check if you have skinny inspection enabled.

By default, when using voice services with skinny protocol, skinny uses also port 2000, as the ACs does for replication.

If you have skinny inspection enabled, it will block any traffic on port 2000 that is not voice traffic, meaning that it will block the replication traffic from the ACSs.

I would follow Nico hint and check with sniffer traces if there is any traffic arriving to the ACS coming from the other ACS.

HTH,

Tiago

--

If this helps you and/or answers your question please mark the question as "answered" and/or rate it, so other users can easily find it.

dnettsw · ‎11-19-2010

Actually I finally got it resolved. I went back over to the

DR site and put a sniffer on that side. I was getting the same kind of traffic.

I just hard reset the box and power it back up. Now it replicates just fine again. Go figure! Thanks for the posts.

jneatherway · ‎11-25-2010

I'm having a similar issue. Can you confirm - did you reboot the ACS or Juniper ??

Thanks

dnettsw · ‎11-29-2010

Hi there,

I actually had to do a hard reset on the ACS. Disclosure thou, I had replication working in our computer center before I moved it. Once I moved it to a different site and Re-IPed is when I ran into problems. I simply just power the box down and powered it back up. It has worked since.

Tarik Admani · ‎11-30-2010

This is a known issue at times, next time you have issues or when you move the box check the following ip address for the "Self" entry under the network configuration, if it has a loopback address and not the nic's IP then that is your problem. The best way to get the physical ip address back is to issue the "set ip" from the cli and have it pull a dhcp address perform the network test and then let it restart the services. Then you can reset its original static ip once the service restart by issuing the "set ip" command. After the services restart again then that should fix the "Self" entry and you should be up and going again.

Thanks,

Tarik Admani