Solved: Cisco ACE HA without preemption

tobin_jim · ‎07-21-2011

Hi There,

I'm in the middle of deploying a pair of ACE 4710s in a HA pair. Each of the 4710s connects to the same stack of 2 x 3750X switches. ACE-01 has two connections to each of the 3750s in the stack and vice versa. The four connections on each each ACE are port-channeled with seperate port channels at the switch end. All ports in the port-channels are in spanning-tree portfast.

What I'm trying to achieve is that when one ACE-01 loses sight of the network, the other one takes over. I have this working. When the failover happens I lose one ping packet to my VIPs. When the ACE-01 comes back online however, failover happens again and this time I lose about 20 seconds worth of traffic. As the ports on the switch are in portfast I don's suspect the usual culprit of the ports going through their spanning-tree states. I guess this is just the way the ACE failover behaves.

What I'd like to do is disable preemption so that when ACE-01 comes back online that there is no failover back to it.

I have read the following guide.

http://www.cisco.com/en/US/prod/collateral/contnetw/ps5719/ps7027/ps8361/guide_c07-572616_ps7027_Products_White_Paper.html

There is one FT group on the ACEs for the Admin Context. Each ACE has a priority of 100. The ACE with the highest IP address on the FT vlan is the Active Master. The other is Hot Standby. I have disbaled preemotion in the FT group.

There is no need for fault-tolerant host or vlan interface tracking. However, I have done as the guide suggested and configured fault-tolerant tracking to lower the priority by 30. When the port channel to the ACE-01 comes back up it has a priority of 70 but STILL it preempts and takes over the master role.

Anyone in Ciscoland have any ideas how to achieve what I need?

Nicolas Fournier · ‎07-30-2011

Hi,

If you look at the debugs, you'll see that this is the reason your ACE-01 takes over even with no-preempt:

011 Jul 30 08:48:06.855231 ha_mgr: (ctx:0)fsm_ft_process_peer_ft_state_msg:1424 Two Actives are present in the Network for ft group 1 My ipaddr 10.87.24.250, Peer ipaddr 10.87.24.251 My time 1312015539 Peer time 1311347807

When ACE-01 comes back up, for some reason it cannot reach ACE-02 and thus assume he should be the active one.

When the connectivity between the two hosts is restored, they found out that they are both active and thus the mastership goes to the one with the highest priority.

How do you proceed with your resiliency tests?

Do you reboot the whole catalyst?

If you reboot the ACE only, we would need to investigate why ACE-01 cannot reach ACE-02 for some time before connectivity is restored between the ACEs.

Maybe you could try to configure a query vlan on your ACE? That way we might be able to avoid the Active-Active situation?

Regards,

Nicolas

View solution in original post

Nicolas Fournier · ‎07-22-2011

Hi Tobin,

You can disable preemption from the the ft group config:

ft group

no preempt

Once it is configured, mastership won't be switched when ACE-01 comes back online.

Regards,

Nicolas

tobin_jim · ‎07-28-2011

Hi Nicolas,

Thanks for the response. I already have the "no preempt" command issued on both ACEs.

ACE-01

ft peer 1

heartbeat interval 100

heartbeat count 10

ft-interface vlan 100

ft group 1

peer 1

no preempt

associate-context Admin

inservice

ACE-02

ft peer 1

heartbeat interval 100

heartbeat count 10

ft-interface vlan 100

ft group 1

peer 1

no preempt

associate-context Admin

inservice

ACE-01 is still taking over as master when it coms back online.

Anything I'm missing. I can supply more of the config if necessary.

Nicolas Fournier · ‎07-28-2011

Hi Tobin,

Can you try again but setting a different priority to each device this time?

ft group X

priority Y

peer priority Y+1

If it still behaves like this afterwards, can you please verify that the standby is really in STANNDBY_HOT state when both devices are active?

Regards,

Nicolas

tobin_jim · ‎07-30-2011

Hi Nicolas,

I put the following config on the primary ACE.

ACE-01

ft group 1

peer 1

no preempt

priority 110

peer priority 120

associate-context Admin

inservice

The stanby ACE has the following config

ft group 1

peer 1

no preempt

priority 120

peer priority 110

associate-context Admin

inservice

I still get the same issue of the primary preempting when it comes back online. I can confirm that the backup is in STANDBY_HOT state.

I have attached the output from the "debug ha_mgr all" command on the standby ACE.

Nicolas Fournier · ‎07-30-2011

Hi,

If you look at the debugs, you'll see that this is the reason your ACE-01 takes over even with no-preempt:

011 Jul 30 08:48:06.855231 ha_mgr: (ctx:0)fsm_ft_process_peer_ft_state_msg:1424 Two Actives are present in the Network for ft group 1 My ipaddr 10.87.24.250, Peer ipaddr 10.87.24.251 My time 1312015539 Peer time 1311347807

When ACE-01 comes back up, for some reason it cannot reach ACE-02 and thus assume he should be the active one.

When the connectivity between the two hosts is restored, they found out that they are both active and thus the mastership goes to the one with the highest priority.

How do you proceed with your resiliency tests?

Do you reboot the whole catalyst?

If you reboot the ACE only, we would need to investigate why ACE-01 cannot reach ACE-02 for some time before connectivity is restored between the ACEs.

Maybe you could try to configure a query vlan on your ACE? That way we might be able to avoid the Active-Active situation?

Regards,

Nicolas

tobin_jim · ‎08-09-2011

Hi Nicolas,

You were correct, when ACE-01 comes back up it couldn't see the ACE-02. This was due to the config on the port-channel trunk on the switch. The trunk was set to "spanning-tree portfast" and not "spanning-tree portfast trunk". This caused the trunk ports to transition through their RSTP states.

Thanks for the help.