Solved: Multihomed spoke site will not form EIGRP neighborships on both

Paul Morgan · ‎07-14-2012

Hello all,

This problem has been pickling my cabbage for about a week now. I have many spoke sites connected back to the head office. We have two routers in the hub acting as primary and backup for all the sites, one 13Mb SDSL and one 2Mb SDSL respectively. I use GRE tunnels exclusively and IPSEC on (most of) the tunnels. All the spoke sites (except one) are connecting using standard ADSL.

I have one site which has a very poor connection so to improve things for them, I am trying to use two ADSL connections and load-balance them.

To keep things simple, I am only trying to enable connections back to our primary router at this time and I am not using IPSEC yet either.

So here is the problem.

I only get a successful neighbour relationship forming on one tunnel at a time. If I shut that tunnel down, the other neighbour forms up and I can turn the first tunnel back on but it will not then form a new neighbourship.

The info that is telling me most about what is going wrong is this:

Sh ip eig nei

IP-EIGRP neighbors for process 6001

H Address Interface Hold Uptime SRTT RTO Q Seq

(sec) (ms) Cnt Num

1 172.20.64.1 Tu0 13 00:00:01 1 2000 2 0

0 172.20.65.1 Tu2 11 18:41:07 55 390 0 752649

The Queue count for the tunnel where the neighbourship is trying to form and failing is always 2. No hellos are received at the far end. Hellos are received from the other end and, in fact, the neighbourship does form, but then it gets timed-out after no hellos are received. The RTO goes to 5000 and then after the retry timer runs down, the relationship is dropped, a new hello is received and the relationship is recalculated. This causes my EIGRP hell as you can imagine.

Ive applied distribute-lists to the tunnels and tried using static routes. Also I tried statically assigning the neighbours with the neighbor command. No dice.

Both the ADSL connections have the same IP next-hop at the ISP. Would this prevent neighbours forming?

I have uploaded the relevant parts of the config and also routing table from the router (sanitised). For completeness Ive included all the distribute-list commands that Ive tried but Ive used them in combination and all together, as well as without them at all.

Your help will be greatly appreciated.

Your slowly-going-mad network admin,

Paul

Peter Paluch · ‎07-15-2012

Paul,

When you added those tunnel key commands, were you at least able to verify that the tunnels themselves are working, i.e. were you able to ping the opposite tunnel address?

I do not believe you can debug the EIGRP queue directly. What you can debug are individual EIGRP packets, i.e. Update, Ack, Query, Reply, and EIGRP retransmissions. That would be debug eigrp packets terse retry

If you are willing to do more involved debug I would suggest creating an extended ACL with permit entries matching the GRE traffic between your router and the head office router in either direction, and the EIGRP traffic over this tunnel, and running debug ip packet N where N is the number of this ACL. Please note that if these GRE tunnels currently carry any traffic beyond EIGRP, this debug is not recommended as there would be LOTS of output.

I wonder... is it by any means possible that some of the tunnel endpoint addresses (i.e. tunnel source or tunnel destination) are advertised in EIGRP through these tunnels? That would cause a recursive routing entry, quite closely resembling the flapping you are experiencing currently. How is the reachability of the tunnel endpoints accomplished in your routing table - is it done via a default route? For the sake of foolproofness, I suggest adding static /32 routes to both routers (spoke and headend) that contains the IP address of the opposite tunnel endpoint, via the appropriate next hop.

Curious issue indeed!

Best regards,

Peter

View solution in original post

Peter Paluch · ‎07-14-2012

Hello Paul,

I have a feeling that the problem may be caused by the fact that you are basically trying to run parallel GRE tunnels between two routers. These routers may be having troubles differentiating which received GRE packets should be processed by a particular Tunnel interface.

Try to configure a unique tunnel key for each of your tunnels, and match this tunnel key on the appopriate tunnel interface at the other end. Using different tunnel keys should help the router to correctly sort the GRE packets among Tunnel interfaces. So for example, I suggest adding the following lines to the spoke router:

interface Tunnel0

tunnel key 100

interface Tunnel2

tunnel key 102

On the primary router at the head office, use the same tunnel keys on the corresponding Tunnel interfaces.

Please let me know if this helped!

Best regards,

Peter

Paul Morgan · ‎07-15-2012

Hi Peter,

I added those Tunnel Key commands but this has not affected the problem unfortunately.

I also removed the distribute-list commands and tested it again but still no change. I have now also removed the eigrp stub setting.

When I run debug eigrp packets hello I can see that the hellos are sent. What command should I use to debug the queue?

Peter Paluch · ‎07-15-2012

Paul,

When you added those tunnel key commands, were you at least able to verify that the tunnels themselves are working, i.e. were you able to ping the opposite tunnel address?

I do not believe you can debug the EIGRP queue directly. What you can debug are individual EIGRP packets, i.e. Update, Ack, Query, Reply, and EIGRP retransmissions. That would be debug eigrp packets terse retry

If you are willing to do more involved debug I would suggest creating an extended ACL with permit entries matching the GRE traffic between your router and the head office router in either direction, and the EIGRP traffic over this tunnel, and running debug ip packet N where N is the number of this ACL. Please note that if these GRE tunnels currently carry any traffic beyond EIGRP, this debug is not recommended as there would be LOTS of output.

I wonder... is it by any means possible that some of the tunnel endpoint addresses (i.e. tunnel source or tunnel destination) are advertised in EIGRP through these tunnels? That would cause a recursive routing entry, quite closely resembling the flapping you are experiencing currently. How is the reachability of the tunnel endpoints accomplished in your routing table - is it done via a default route? For the sake of foolproofness, I suggest adding static /32 routes to both routers (spoke and headend) that contains the IP address of the opposite tunnel endpoint, via the appropriate next hop.

Curious issue indeed!

Best regards,

Peter

Paul Morgan · ‎07-15-2012

Ok - so after another cup of tea and a few more biscuits - "once more into the fray old friend".

I tried setting static routes at both ends to the respective tunnel endpoints, meaning the IP addresses on each end of the tunnel. Then I disabled EIGRP from both routers for these tunnels. I created access-list 190 permit gre host SPOKE host HUB for tunnel A and acl 191 again for tunnel B and ran debug ip packet 190. This first debug might have the answer. The output shows gre packets with source of tunnel A being routed out of Dialer B. Even with ip route tunnelA-end-ip/32 tunnelA configured on both hub and spoke routers, I cannot ping the other end IP.

Since the public IP is not in the routing table and I assume the tunnel destination command does not affect the choice of route, should the Tunnel Source xxx command not dictate which Dialer the packet takes (especially bearing in mind that there are identical default routes ip route 0.0.0.0 0.0.0.0 dialer0 and ip route 0.0.0.0 0.0.0.0 dialer1)?

Ok so 5 mins of thinking and I figured, wait - I'll bully the routes.

access-list 190 permit gre host TunA host HUB

access-list 191 permit gre host TunB host HUB

route-map FIX-DIRECTION permit 10

match ip add 190

set int Di0

route-map FIX-DIRECTION permit 20

match ip add 191

set int Di1

ip local pol rout FIX-DIRECTION

SMACKDOWN BABY!!

Thats sorted it. You can see from the debug ip packet 190 now that the traffic is going out of Di0. Both tunnels ping successfully. So Ive removed all the configs we added and set up EIGRP as it was originally and (get ya flags out) - its working!!

Thank you so much for your help with that Peter. I am very grateful. This will be a fantastic scrapbook fix for the future.

Many thanks,

Paul

Peter Paluch · ‎07-15-2012

Hello Paul,

First of all, I am very glad to see you were able to find out the solution! Thank you for letting me know!

It seems, though, as if the problem was caused by your ISP instead of the configuration on the spoke router itself. I'll gradually explain.

Since the public IP is not in the routing table and I assume the tunnel  destination command does not affect the choice of route, should the  Tunnel Source xxx command not dictate which Dialer the packet takes  (especially bearing in mind that there are identical default routes ip  route 0.0.0.0 0.0.0.0 dialer0 and ip route 0.0.0.0 0.0.0.0 dialer1)?

There are several thoughts here that should be commented upon:

The tunnel destination command actually affects the choice of route - with the address specified in this command becoming the destination IP address of the tunneled packet, the router will try to look up this IP address in its routing table, and will try to use the longest prefix match as usual to forward the packet to the destination. However, in your case, the only routing table entry that matched the tunnel destination was the default route that was configured as an equal cost multipath route via two Dialer interfaces. The router therefore started performing load balancing over equal cost routes.
The tunnel source indeed does not dictate which Dialer interface will the packet be sent out from. This command merely refers to the name of an interface whose IP address will be used as the source IP address in the outer IP header of tunneled packets (just a convenient way of configuring things but having no effect whatsoever on the packet's route). However, the choice of egress interface is given by usual routing rules, and without any further configuration, the only parameter used when routing a packet is its destination IP address.

This is why your GRE-tunneled packets started flowing via both Dialer interfaces. Now, there should not be any problem with this. However, your ISP seems to be using some kind of protection whereby through a particular PPPoA session, it accepts only IP packets sourced from the very IP address assigned to this particular PPPoA peer. Because your router started doing load balancing, packets sourced from Di0 could have been also sent via Di1 and vice versa (remember, only the destination IP address is used in ordinary IP routing to determine the packet's fate). The ISP probably dropped such "crossed" packets, resulting in the inability of one of these tunnels to actually carry packets. At least this would be my take on it.

As you configured PBR for locally originated packets, you basically locked down which egress interface should particular tunneled packets use, and thereby made the ISP security check on your packets happy.

Anyway, I am honestly glad to know you solved this issue, and I also learned a lot here! Thank you!

Best regards,

Peter

Multihomed spoke site will not form EIGRP neighborships on both lines