We currently have a pair of C6500 connected via a 2x 10G port-channels (802.1q) and have HSRP configured to provide FHRP as HA solution to a downstream router. The router has a default route to one of the HSRPs SVIs (VIP: 10.1.200.1/24; SVI: 10.1.200.2/24 configured on SW_01(Primary); SVI: 10.1.200.3/24 configured on SW_02(Secondary). Last week, we had a soft failure on one of the switches which caused the router to loose connectivity to the network. The soft failure affected the VIP: 10.1.200.1 which caused the connection between the router and switches to fail.
My question is, from a design stand point, what are the design alternatives besides HSRP to make this design more resilient than having a single VIP? Meaning, what are the options that I can use besides having the router to point to a single HSRP VIP?
Thanks in advance,
you could use a igp between the cores and the rtr or condictional static routing
Thanks for the input. I should probably clarify more.See attached for illustration.
- We have EIGRP configured between the two core switches
- The two core switches are connected via a Port-Channel Trunk (802.1q)
- The downstream RTR belongs to a tenant in the building but we provide them with access to the Internet and a few internal resources (access to internal servers and apps)
- On the core switches, we have a default route to the inside interface of the HA ASAs
Now, with this design, it appears to me that this design presents a single point of failure irrespective of the redundant physical solution (two core switches connected via a port-channel (bundle of 2x10G physical interfaces), and hsrp). The reason I see it this way is because when you have a single HSRP VIP and a single default route to the active ASA that poses a risk.
Going back to my original question, even I have dynamic routing configured between the two core switches and have hsrp configured to provide HA, what happens when some soft failure (in our case anther Cisco device was connected to one of the core switches which caused the VIP to fail)?
I was considering the following solutions:
- And, probably, configure the ASAs in an active/active rather active/passive
I do appreciate your feedback/input.
You have made a couple of vague references to a soft failure that made the VIP fail. I would like to understand better what was this soft failure and how did it make the VIP fail. I would also like to understand how this downstream router is connected to the switches running HSRP. If the router has a physical connection to one of the switches, then yes there is a significant single point of failure. If the router connects through some intermediate switch then the risk is somewhat lessened.
It would also be helpful if we understood more about connectivity between your core switches and your ASAs. I am not sure how changing the ASAs to active/active would be an effective solution.
The soft failure that occurred had to do with another eng plugging another device to an access port on one of the core switches. This event caused the HSRP VIP 10.1.200.1 to conflict with another HSRP VIP. See the log entry below.
The downstream router is physically connected to both core switches via L2 and has a static route pointing to VIP: 10.1.200.1. So, when the that soft failure(the VIPs conflict) occurred, the downstream router couldn't forward packets to the VIP (10.1.200.1). We can conclude that this a single point of failure.
Regarding the connectivity between the two core switches and the pair of HA ASAs, it pretty straight forward,
Core SW_01 has a L2 link to a L3 interface (10.1.200.1) --- This is the Primary/Active ASA
Core SW_02 has a L2 link to a L3 interface (10.1.200.1) this is the Secondary/standby ASA to the config is identical to the Primary ASA
We have a static route/default route to the inside interface of the ASA (10.1.200.1). Now, since we have a single static route pointing to a single active ASA (10.1.200.1), when the HSRP VIP 10.1.200.1 failed the core switch stopped forwarding traffic to this IP address: 10.1.200.1. Had we had two active/active ASAs, then we would have both core switches having two default routes to each ASA. Would this be a viable solution?
Thanks in advance, ~zK
Here's the log off of the core switch which caused the HSRP VIP 10.1.200.1 to fail:
"DT: %HSRP-4-DIFFVIP1: Vlan200 Grp 1 active routers virtual IP address 10.234.80.1 is different to the locally configured address 10.1.200.4"
I am not clear how an engineer connecting another router into vlan 200 would disrupt the configured HSRP. But it did. This was an operational error. And operational errors can cause problems. The fact that an operational error impacted your network does not indicate that the design is bad.
I am puzzled about your addressing. You have identified 10.1.200.1 as the virtual address for HSRP. And you have also identified that as the address on the ASA interfaces. Can you provide clarification about your addressing?
To run ASAs in active active requires that the ASAs be configured in multiple context mode. I am not sure that you really want to do that. And even if you did I do not believe that active active would have prevented this problem.
I didn't say that the other device that was connected by another eng. was a router, but, nonetheless, it was a device that caused the HSRP VIP to create a conflict with another HSRP group (having the same Group ID). I found this post related to the error I pulled from the log https://community.cisco.com/t5/switching/hsrp-4-diffvip1/td-p/2119442
I am not saying that the design we have is a bad one, but some designs have weaknesses and in this case it was obvious that there was a weakness.
I presume that you didn't see the diagram I attached in my previous post. The topology shows that the default gateway is 10.1.200.254, which is the IP address assigned to the ASA inside interface; HSRP Group 1: VIP 10.1.200.1; HSRP IP address assigned to core switch 01: 10.1.200.2; and HSRP IP address assigned to core switch 02: 10.1.200.23.
My team any I entertained the idea of having the ASAs in ACTIVE/ACTIVE configuration so this way we can have redundant gateways (two MHSRPs = two active ASAs). If one VIP were to fail, then the redundant VIP would still be active.
Your input is much appreciated.
Not sure if this applies to your configuration, but it would probably be best just to have direct L3 links from the router to each 6500 and enable a routing protocol between the devices.
If that is not possible in your situation, perhaps you could provide more details with regard to the topology and connectivity so that a more suitable solution could be determined.
I agree with Richard Burts in that an operational error should not be considered a faulty design. One can take any configuration and whether intentional or unintentional, anyone can throw a wrench into the works and cause problems. The design is primarily to provide hardware/link redundancy and other aspects such as dynamic routing should provide an additional layer for most software issues.
I do find it interesting that the downstream router is on VLAN 16, yet you say its default route points to the VLAN 200 VIP. I've seen that work, but I prefer to make a device's next hop to be upstream to an attached interface.
Regardless, if there was an IGP running between the downstream router and the 6500s, or whatever redundant gateway protocol was running, screwing up the VIP between the 6500s and ASAs would cause issues. Even if you have multiple gateways, or load balanced, some traffic would have been affected.
Anyway, just my thoughts.