Re: Routing gone mad

Report Inappropriate Content · ‎04-25-2013

I'm hoping that somebody may be able to help me.

I have a very bizarre issue that I can't get my head around and have completely run out of ideas.

I have 2 6500 core switches (switch 1 - primary [10.207.0.9]/ switch 2 - secondary [10.207.0.10]). Both of these switches are attached to upstream routers.

Each router is advertising a number of routes via EIGRP, one of these routes in 172.16.110.0/24.

Each switch knows that the best path to these routes is via the upstream router (primary router) attached to switch 1.

I have verified the routing tables and all looks correct.

I also have another network 172.16.0.0/16 directly connected to each switch.

The problem I am seeing is that to some of the hosts on the remote network (172.16.110.0/24) are contactable, however some are uncontactable and a traceroute suggests they loop between the 2 core switches.

Tracing the route to 172.16.110.40

1 172.16.0.24 0 msec 4 msec 0 msec

2 10.207.0.9 0 msec 0 msec 0 msec

3 172.16.0.24 0 msec 0 msec 0 msec

4 10.207.0.9 0 msec 0 msec 0 msec

5 172.16.0.24 0 msec 0 msec 0 msec

6 10.207.0.9 0 msec 0 msec 0 msec

172.16.0.24 is an interface on the secondary switch, however I can't see it's relevance here and it's certainly not listed in the routing table as a next hope for any routes.

I realise this is difficult to sum up, but if anybody can assist or would like further information it would be much appreciated.

mfurnival · ‎04-25-2013

Hi Neil,

It would be useful to see the output of the following for both devices:

#show ip route

#show ip interface brief

#show standby

Your traceroute suggests that your host send the ping packet to the secondary switch SVI for its first hop - what gateway does your host have configured?

Report Inappropriate Content · ‎04-25-2013

Hi,

I've attached the output in 3 documents. Appreciate any feedback,

Neil

mfurnival · ‎04-25-2013

Neil,

Two of those documents are the same - can you send the output of each command on each switch please?

Report Inappropriate Content · ‎04-25-2013

Sorry about that - Can only upload 5 docs and I got distracted.

I have added switch 2 documents to this post and updated the switch 1 documents on the previous post.

Cheers,

glen.grant · ‎04-25-2013

I would say you have overlapping network ranges. 172.16.0.0 /16 covers everything 172.16.0.0 - 172.16.255.255 so having another network 172.16.110.0/24 is not correct . I would check to see if the ranges are overlapping though usually the router will complain when trying to configure. Are you sure it's a 172.16.0.0/16 ? Thats a huge subnet.

Report Inappropriate Content · ‎04-25-2013

Hi Glen,

I agree overlapping subnets is not ideal, especially one on a 16 bit netmask. That said the range 172.16.110.0 - 255 is not used by any hosts on the /16 vlan.

In this situation though does the prefix length and the administrative distance not take care of path selection?

Neil

Gregory Snipes · ‎04-25-2013

Totally agree with Glen on this one. Connected routes are always preferred by the router. Since your routers have an interface directly to 172.16.0.0 /16 via VLAN 2, they are going to try and use that before they use the EIGRP route, hence the looping.

Report Inappropriate Content · ‎04-25-2013

Greg - If I trace from the 2nd switch I see this correctly route in accordance with EIGRP.

In fairness this switch was recently rebooted, although I am confident all config was written first.

Are you saying that even though the /24 network has the longer prefix length, the primary switch will opt to use the shorter prefix length /16 first?

If so, are you able to explain why my secondary switch works as I expected it to? I.e. sends traffic to the EIGRP route and not the connected route.

If your theory is correct, I wouldn't expect the traffic to loop, I would just expect it to not respond as there are no hosts on the network with the IP addresses I am tracing or pinging to test.

If I understand this article correctly, it suggests that prefix length isn't ignored by the rib even when dealing with directly connected routes.

http://cciethebeginning.wordpress.com/2013/03/18/administrative-distance-prefix-length-metric-who-is-the-winner/

I should also add, this was working recently, and has recently stopped working.

In addition. If I trace to 172.16.110.40 the traceroute fails by looping. If I trace to 172.16.110.41 the traceroute is successfully routed. The same apples to .42 & .43 respectivly.

Hope that makes sense.

Cheers,

Neil.

Gregory Snipes · ‎04-25-2013

The concept we are dealing with here is administrative distance. Please see the table on this page, note that directly connected networks have a distance of 0. The length of the prefix should only come into play after administrative distance. Administrative distance is there to prevent routing loops, if a router is directly connected to a subnet it should never send a packet off to some other router to reach it. Scince 172.16.110.40 lays within the range of host that should be directly connected to the router, the router should send it directly out that interface and not somewhere else.

What is the reason you need to have that network be a \16? You should just be able to choke down the subnet mask on those interfaces and make the problem go away.

Report Inappropriate Content · ‎04-25-2013

OK - so administrative disatance comes into play before prefx length? Apologies, I thought it was the other way around.

Are you able to explain a little further though as I'm confused as to why some addresses within the /24 network are reachable and others aren't?

I'm also confused as to why this was working prior to a reload?

Sadly, I can't change the size of the /16 network. It's a legacy network which should be decomissioned soon, however in the mean time before changing that I need to understand the impact, and until now there hasn't been any.

Neil.

Gregory Snipes · ‎04-25-2013

It is very odd that any of them are working in this case. Could you post up a diagram of how these are connected. That might shed some light on why this is behaving this way.

Also, could you tell me what 10.207.0.14 is and how it fits into this setup?

Report Inappropriate Content · ‎04-25-2013

I've heard from Cisco tech support who are none the wiser to this, however they have also agreed with me that prefix length is considered before administrative disatance, so this is a viable routing scenario.

10.207.0.14 is the upstream router than is advertising the EIGRP routes.

I will add some diagrams, however this may be tomorrow I'm afraid.

Thanks for your help.

Gregory Snipes · ‎04-25-2013

I might have been off on the admin distance first issue, actual documentation reads as follows:

"The longest prefix match always wins among the routes actually installed in the routing table, while the routing protocol with the lowest administrative distance always wins when installing routes into the routing table."

The conflict between the directly connected 172.16.0.0/16 and the EIGRP 172.16.110.0/24 is still like 99% for sure your issue though. I don't believe having full /16 networks directly connected and smaller subnets of that /16 coming in though a routing protocol is a scenario that was designed for in these devices.

mfurnival · ‎04-26-2013

Longest prefix length is always considered first regardless of all other factors. AD only comes into play when you have routes of equal prefix length from different sources.

I agree that what you are doing with your subnetting is not best practice though it is not wrong per se. The issue that you can face with this scenario is that if you are doing network summarization at some point you can run into problems.

Can we see the routing setup on the upstream router(s).

Also - can you confirm that all devices use 172.16.0.1 as their gateway and not the individual switch SVIs?