EIGRP or CEF load-sharing question

Ryan YC · ‎05-18-2022

Hi,

My company is having a routing or load-sharing problem so I hope you can help me. The following diagram is the current situation:

I don't know when the direct layer 2 connection between SW1 and SW2 went down, the link is a 20km fiber connection between two buildings so we can't fix it quickly, and we don't have a backup connection between the switch so HSRP has become Active-Active.

We are using EIGRP equal-cost load sharing in the environment. but it seems load-sharing is the reason to make me having this problem. When I tried to connect to the file server from the Core-1, I won't be able to reach the target, because Core-2 sent my traffic to the incorrect segment which was not connecting to the file server. if I shut down the connection between Core-2 and SW2, I can connect to the server again.

We currently use several static routes to specify the possible route, but we would like to think about a global solution for now and in the future, we wouldn't like to prioritize the EIGRP metric or filter the routes at this moment because there are actually over hundreds of servers connected behind the SW3 or SW4. The above diagram is just one of the small segments in our network.

I have thought there maybe are some technologies that I can let the Core-2 automatically determine which route is possible to reach the target devices. I have been reading the articles of CEF and EIGRP and trying to help myself.

Thank you!

Kasun Bandara · ‎05-18-2022

share your sanitized configurations for L2 switches and core 2

Please rate this and mark as solution/answer, if this resolved your issue
Good luck
KB

Richard Burts · ‎05-18-2022

There are some things about the description of the issue that I do not understand. I especially do not understand how EIGRP load sharing is the issue. The drawing suggests 3 vlans running HSRP on sw1 and sw2. And if the layer 2 connection between those switches is not functioning then each of them will assume that they are the active router. But how does that relate to EIGRP?

The drawing is good about identifying physical connections between sw1 and sw2 to core2. But provides no information about whether the connections to core2 are trunk ports (carrying vlan1, vlan2, and vlan3) where layer 3 routing is not an issue or is some other vlan where layer 3 routing is an issue.

Since the original post seems to assert that layer 3 routing (with EIGRP) is the issue perhaps we should assume that the connection of core2 to sw1 and sw2 are routed links. In that case the issue is not about HSRP (which switch has the mac address for the virtual interface) and is about the fact that sw2 advertises that it can reach ALL of the addresses in those 3 subnets when in fact it can reach only SOME of those addresses.

I believe that the real issue here is that if the direct link between sw1 and sw2 is down, then we have a situation where the same IP subnet exists in two separate and distinct places in the network. That is the fundamental problem.

HTH

Rick

Ryan YC · ‎05-19-2022

Hi Richard,

Thank you for helping!

You are right. The disconnection between SW1 and SW2 is the fundamental problem (Sorry I haven't explained it well). We wouldn't like to make any changes to the physical connections because the engineering will take a very long time, so we are currently just thinking of something we can adjust in EIGRP or CEF. But we are also told better not to use static route or filtering as the workaround because there are actually many devices that work like "Core-2" and more "SW1" and "SW2" at the different locations.

The SW1 and SW2 are layer 3 switches. The connections between Core-2 and SW1, and Core-2 and SW2 are Layer 3 links.

Jon Marshall · ‎05-19-2022

It's difficult to suggest a solution as this is not the full topology but I am not sure you can really fix this with EIGRP etc.

The cleanest way is to make SW1 and SW2 L2 switches and move the SVIs back to Core-2 so now it is arp that works out the correct path ie. via SW1 or SW2.

Core-2 would be a single point of failure but it is now or at least is in your diagram.

But as said by others there really is no clean way to do this.

Jon

Joseph W. Doherty · ‎05-19-2022

BTW, what @Jon Marshall proposes should also work.

I also had thought to suggest same, but I think (?) what I did propose keeps closer to your current topology, and if you restore the switch 1 and 2 link, Core-2 doesn't become a single point of failure (if it alone hosts the gateway IPs), although as I also mentioned in my first post, you may want to reconsider/review your existing topology configuration.

Joseph W. Doherty · ‎05-19-2022

Ah, just as I suspected.

Again, the quick fix defined in my first post.

If what I described is not clear, define switches' 1 and 2 VLANs on Core-2. Add two "new" VLANs. Define one of the new VLANs on both switch 1 and Core-2 (NB: BTW, I recall just defining SVIs will create VLANs, by default), define the other new VLAN on switch and Core-2. Change links between swtiches 1 and 2, and Core-2 to trunks. Move (I assume) prior p2p interface IPs (on the reconfigured trunk interfaces) to SVIs for the two new VLANs.

e.g.

Current(?)

switch 1
interface g0/1
ip address x.x.x.x x.x.x.x

switch 2
interface g0/1
ip address y.y.y.y y.y.y.y

core-2
interface t1/0/1
ip address x.x.x.xx x.x.x.x

interface t1/0/1 !from diagram - dup interface???
ip address y.y.y.yy y.y.y.y

Proposed

switch 1
interface g0/1
!trunk

interface Vlan x
ip address x.x.x.x x.x.x.x

switch 2
interface g0/1
!trunk

interface Vlan y
ip address y.y.y.y y.y.y.y

core-2
interface t1/0/1
!trunk

interface Vlan x
ip address x.x.x.xx x.x.x.x

interface t1/0/1 !from diagram - dup interface???
!trunk

interface Vlan y
ip address y.y.y.yy y.y.y.y

Jon Marshall · ‎05-19-2022

Wasn't sure if were were describing same thing.

I was proposing moving the L3 SVIs to Core-2 ie. remove all routing between Core-2 and SW1/SW2 so can't see need for new vlans as such.

Jon

Joseph W. Doherty · ‎05-19-2022

@Jon Marshall, nope we're proposing two different solution, either though, should work.

The purpose of the new VLANs is strictly to allow existing L3 P2P links to be migrated across those links being converted to trunks.

The most likely problematic issue, for either approach, is if switch 1 and switch 2 VLANs are already defined on Core-2.

Richard Burts · ‎05-19-2022

It is interesting to talk about HSRP and HSRP is part of the environment. But HSRP is not the real problem. It is interesting to talk about EIGRP and EIGRP is part of the environment. But EIGRP is not the real problem. The real problem is discontiguous subnets. Some of the devices in 192.168.1.0 are connected to sw1 while other devices in 192.168.1.0 are connected to sw2. And both sw1 and sw2 are advertising the entire subnet, claiming that they can reach ALL of the devices in the subnet when actually they can reach only SOME of the devices.

The solution needs to address the discontiguous issue. The best solution is to repair the layer 2 link between the switches so that they can in fact reach all of the devices in the subnet. I like the suggestion of using PBR to direct traffic to the correct switch for certain servers and other important equipment. But this would be only a partial solution. Perhaps there is an alternative to set the IP addressing so that the first half of the subnet is connected on sw1 while the other half of the subnet is connected on sw2. Then each could advertise a /25. This could work but is really more a long term solution and we are looking for short term solutions. I like the suggestion of removing the layer 3 logic from sw1 and sw2 and extending the vlans to the core switch. That would solve the discontiguous issue for these 3 vlans/subnets. But that would likely impact other parts of this environment that we are not aware of.

HTH

Rick

Jon Marshall · ‎05-19-2022

"But that would likely impact other parts of this environment that we are not aware of."

I strongly suspect it would especially as the OP says there are a lot more switches etc. than in the diagram he posted but using PBR or readdressing seems messier to me than allowing the core switch to "naturally" find the correct path.

But it is all a matter of opinion as always.

Jon

Richard Burts · ‎05-19-2022

Jon

I agree that, in part, it is a matter of opinion. But it is also very dependent on things that we do not know about this environment. In comparing the alternatives that have been suggested I would agree with you that moving the routing from sw1 and sw2 to the core switch is simpler and that PBR or readdressing are more complicated (and perhaps messy). If moving the routing logic from sw1 and sw2 is acceptable then I would happily endorse this as the best choice. But I suspect that this would impact other parts of the network and would not be an acceptable choice. In that case both PBR and readdressing can be implemented with little or no impact on other parts of the network.

HTH

Rick

Joseph W. Doherty · ‎05-19-2022

". . . I would agree with you that moving the routing from sw1 and sw2 to the core switch is simpler . . ."

Then PBR and readdressing - agreed.

Unsure it's "simpler" than what I proposed because you eliminate the existing p2p networks (possibly a good thing, as ECMP load balancing in the "before broken" network, can cause needless East/West traffic), you change edge SVIs addressing, you may need to make changes to EIGRP network statements (much depends how they are defined now).

Again, either approach should work, including if switches' L2 trunk is reactivated (again assuming a STP variant is enabled).

So, I believe my approach might be "better" for a quick fix, again, only because it retains as much as possible of what exists now. (What matters most, is what OP believes better suits them, and possible impact to other parts of the network. Also, by not "touching" logical routing topology, my approach might be "better" in that aspect too.)

Further, if a truly better (long term) approach were to be taken, I believe it would be better to take better advantage of switches' 1 and 2 L3 capability then moving them away from L3 to more L2.

e.g.

Current(?)

switch 1
interface vlan1
ip address 192.168.1.254 255.255.255.0
standby 1 192.168.1.1 !or .252, or?

interface g0/1
ip address x.x.x.x x.x.x.x

switch 2
interface vlan1
ip address 192.168.1.253 255.255.255.0
standby 1 192.168.1.1 !or .252, or?

interface g0/1
ip address y.y.y.y y.y.y.y

core-2
interface t1/0/1
ip address x.x.x.xx x.x.x.x

interface t1/0/1 !from diagram - dup interface???
ip address y.y.y.yy y.y.y.y

Proposed

switch 1
'vlan 1 needs to be "known"

interface g0/1
!trunk

switch 2
'vlan 1 needs to be "known"

interface g0/1
!trunk

core-2
interface t1/0/1
trunk

interface t1/0/1 !from diagram - dup interface???
trunk

interface vlan1
ip address 192.168.1.1 255.255.255.0 !or .252, or?

Vlans 1 .. N, as migrated from switches 1 and 2, and defined on core-2.

EIGRP might also need to be updated on switches 1, 2 and core-2, to reflect removal of p2p networks, and migration of SVIs from switches 1 and 2 to core-2. (Depends much on how EIGRP network statements are defined.)

Both Jon's and my approach change switches 1 and 2 links to trunks. Both require VLANs, on switches 1 and 2 be made known on core-2.

My approach keeps existing L3 network interfaces, at least logically (e.g. physical interface might be changed to SVI), where they currently exist.

Joseph W. Doherty · ‎05-19-2022

BTW, I known it's unusual to use an access port using a SVI to terminate a p2p routed link (although under the covers, a Cisco switch "routed" port is a "special" SVI access port), rather than a "routed port", but it does work, and it can offer a quick recovery option for certain hardware failures.

Consider a "remote" two member L3 switch stack connected to a chassis L3 switch, but only having one connecting link.

Likely one switch member uplink port is defined as "routed" which connects to one chassis line card port, also defined as "routed".

What happens if either switch's "routed" port, or ASIC controlling that port, or line card or switch member hosting that port fails?

Well, of course, you have a broken connection. How quickly can you recover, assuming you have another switch port you can use on that two member stack or the chassis switch?

You need to be able to access the "problem" switch and often reconfigure another port to be the "new" routed port. (Hopefully, you have some form of remote configuration access not requiring the in-band connection that just failed.)

Or, you can define a dedicated SVI, for the "routed" IP, and configure two access port, on the switch, in that dedicated VLAN. If the switch port fails, for whatever reason, you might have someone (including non-technical, at the site) try re-patching the connection from the failed port to the alternate port.

Also BTW, at my last employer, no-one could see the benefit of this approach, as that's not how we setup "routed" connections on L3 switches. I was first to admit, it only covers a "rare" situation, but if you can, why not? Too much trouble to do, was the response.

Anyway, a few months later, we had a small site, with a dual L3 switch stack, with a single "WAN" connection, have its "WAN" connection stack member die. It only took close to 8 hours to get a tech on the site to reconfigure the remaining stack member to migrate the "routed" port.

Afterwards, the regional tech manager, asked, so if we had done what you proposed, our outage might have only been a few minutes rather than 8 hours? In this case, yup.

Of course this approach still not adopted because the "long term" solution was to either have multiple "WAN" connections and/or out-of-band console connections to all sup cards or switch members; sigh, because it's more difficult to argue for the need of those solutions unless you have 8 hour outages without them.

MHM Cisco World · ‎05-18-2022

Hsrp connection must be remove and connect between sw3 and sw4

Config l3 interconnect and run EIGRP between two hsrp sw, this make if

Hsrp sw loss connect to server it vlan will be down and sw will redirect traffic to other hsrp sw via l3 interconnect since it recieved server subnet know from it,

Here core2 send to eithet one the traffic not drop becuase blackhole