Unusual GRE connectivity issues

vanderwaalr · ‎12-21-2009

Hi,

I have a problem relating to GRE connectivity that I just can't pinpoint the cause of.

I have a customer with 3 sites that currently connect in a hub-and-spoke fashion. Site 'H' is the hub site with sites 'A' and 'G' connecting to 'H' via 9.6kbps serial links. The network connections are only used for control systems and therefor high bandwidth is not essential. See diagram 'design1.jpg' below:

'Router-H' is a Cisco 2811 router, while 'Router-G' and 'Router-A' are both Cisco 1841 routers. 'Router-H' is running Advanced IP Services 12.4(22)YB4 and 'Router-G' and 'Router-A' are both running IP Base 12.4(22)YB4. This version of IOS is currently used because it is a requirement to support the HWIC-3G-GSM cards installed in all 3 routers, which will be used as part of a later project.

The network is being migrated away from the serial links to a managed WAN with copper ethernet handoff. The managed WAN is run by a different business unit and does not route RFC1918 addressing through the WAN. As such, we are configuring a set of GRE tunnels in a hub-and-spoke fashion (basically providing the same connectivity the legacy serial links do) between the 3 sites. Again, 'Router-H' is the hub router with 2 GRE tunnels configured to both 'Router-G' and 'Router-A'. See diagram 'design2.jpg' below:

When the GRE tunnels were first configured, they all came up and had IP connectivity running across both tunnels. The problem was experienced a few hours after the tunnels were configured, where they just dropped off and went into a 'down' state for no known reason (we have keepalives configured across both GRE tunnels). For some reason, the tunnels would not re-establish connectivity and return to an 'up' state. In an attempt to re-establish tunnel connectivity, we issued a 'shut' and 'no shut' on the tunnel interfaces at both ends of the tunnels, but with no success. After some fiddling around, we found the only way to restore tunnel connectivity was to basically erase the tunnel configuration at both ends ('no int tunnel 1', 'no int tunnel 2') and then re-apply the configuration. After re-applying the configuration from scratch the tunnels all re-established. However, again, after a few hours the tunnels dropped for no apparent reason and the only way to get them to re-establish was to repeat the process of erasing the tunnel configurations and re-applying. This is continually happening so we have been forced to restore the legacy serial links until we can resolve the GRE connectivity issues.

The GRE configuration is pretty straight forward:

'Router-H' configuration:

interface Loopback0

ip address 134.xxx.zzz.131 255.255.255.255

no ip redirects

no ip unreachables

no ip proxy-arp

!

interface Tunnel1

description Tunnel to Router-A

ip address 10.10.10.1 255.255.255.240

ip mtu 1500

keepalive 10 3

tunnel source Loopback0

tunnel destination 134.xxx.zzz.130

!

interface Tunnel2

description Tunnel to Router-G

ip address 10.10.20.1 255.255.255.240

ip mtu 1500

keepalive 10 3

tunnel source Loopback0

tunnel destination 134.xxx.zzz.129

!

interface FastEthernet0/1

ip address 134.xxx.yy.90 255.255.255.252

duplex auto

speed auto

!

'Router-G' configuration:

interface Loopback0

ip address 134.xxx.zzz.129 255.255.255.255

no ip redirects

no ip unreachables

no ip proxy-arp

!

interface Tunnel1

description Tunnel to Hastings

ip address 10.10.20.2 255.255.255.240

ip mtu 1500

keepalive 10 3

tunnel source Loopback0

tunnel destination 134.xxx.zzz.131

!

interface FastEthernet0/1

ip address 134.xxx.yy.86 255.255.255.252

duplex auto

speed auto

!

'Router-A' configuration:

interface Loopback0

ip address 134.xxx.zzz.130 255.255.255.255

no ip redirects

no ip unreachables

no ip proxy-arp

!

interface Tunnel1

description Tunnel to Hastings

ip address 10.10.10.2 255.255.255.240

ip mtu 1500

keepalive 10 3

tunnel source Loopback0

tunnel destination 134.xxx.zzz.131

!

interface FastEthernet0/1

ip address 134.xxx.yy.82 255.255.255.252

duplex auto

speed auto

!

At one point we thought it was a connectivity issue with the managed WAN but the fact that the GRE tunnels do establish when you first apply the configuration has led us away from this thinking at this stage. Another train of thought is that it is a bug of some sort, but we have tested running GRE tunnels over the serial links on the same hardware and this works fine, so we are not focusing our attention on this train of thought.

Unfortunately we do not have configuration access to the managed WAN but do have access to an Engineer who supports the managed WAN.

Has anyone out there experienced issues like this, or can anyone offer any suggestions/thoughts on where we should look..? All reports from the managed WAN engineer are that there is no access-list filtering in the WAN blocking GRE connectivity. ICMP ping connectivity is also not blocked.

Thanks in advance.

Marwan ALshawi · ‎12-22-2009

two thing to check first

make sure the loopack address ( used as a tunnels rources and destinations ) reachable via your IGP ( you have to have in the routing table first)

this is to make sure no recursive routing happing

next decrease mtu in the tunnel interface

last thing why you are using keep alive here are tracking this interface or what ?

also why you dont use fa0/1 as the tunnel source ?

good luck

vanderwaalr · ‎12-22-2009

Thanks for your response.

I should have mentioned this in my first post, but the Loopback addresses are all reachable within the managed WAN. This was the first thing I tested. 'Router-H' can ping the loopback addresses of both 'Router-G' and 'Router-A', and vice-versa.

We have also tried reducing the MTU size down to 1476 but this did not seem to fix the issues, the tunnels still dropped after an hour or so.

One comment worth making is that when I do a 'show interface tunnel 1' the MTU size reported in the output is 17916, even though we have manually configured 'ip mtu 1500'.

We are using keepalive to ensure the GRE tunnels are tracked on both ends of the Tunnel.

Lastly, we have tried using Fa0/1 as the tunnel source but again this did not fix the problem.

Marwan ALshawi · ‎12-22-2009

can just try to keep tunnel from the hub to one of the spokes and see if its drop or not

only one trunnl

i am just thinkingmay be the hub droping this tunnel

if htis is the case you may try mGRE

Peter Paluch · ‎12-23-2009

Hello,

Just a single suggestion: is it acceptable for you to deactivate the keepalives for, say, a day long, and have a continuous conectivity test such a ping running on PCs on your locations across the WAN to see if the tunnels stay up and active? I would like to see if the problem with the tunnel stability can be tracked down to the keepalive mechanism or if there is some more profound cause that prevents your tunnels from exchanging data properly.

Also, is there any, even a minor, logging message recorded in your logs around the time your tunnels go down? That may perhaps give us some glimpse on what is going on.

Best regards,

Peter