cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
8471
Views
5
Helpful
10
Replies

GRE Tunnel drops??

Tony Riccardi
Level 1
Level 1

So here's my setup:

Internal Router (2821) >> Internal DMZ ASA Cluster >> DMZ Router (2821) >> External DMZ Checkpoint Cluster >[[Internet]]> Branch Office Router (877)

The Internal ASA Cluster has PAT configured so all internal production VLANs.

The DMZ Router has an inside interface configured on the Internal DMZ and an outside interface configured on the External DMZ. The DMZ Router has two loopback interfaces configured.

The External Checkpoint is configured with NAT for inbound and outbound traffic.

The Branch Office is a DSL Router with a Static IP.

The first requirement is to configure a GRE over IPSec Tunnel between the DMZ Router and the Branch Office Router.

The second requirement is to configure a GRE over IPSec Tunnel between the Internal Router and the DMZ Router.

The third requirement is to allow dynamic routing between the Internal Router and the Branch Office via the DMZ Router because this is ultimately the backup link between the Head Office and Branch Office.

I have successfully configured a GRE over IPSec Tunnel between the DMZ Router and the Branch Office Routers.

I can also successfully configured a GRE Tunnel (without IPSec) between the Internal Router and the DMZ Router.

However whenever the GRE Tunnel establishes between the Internal and the DMZ Routers and an EIGRP neighbour forms, the EIGRP neighbourship between the DMZ Router and the Branch Office drops! See following log file from the DMZ Router:

Tunnel 1 = to Branch

Tunnel 100 = to Internal

002885: .Mar  3 22:32:57.013: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to up
002886: .Mar  3 22:33:06.029: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.17.205.61 (Tunnel1) is up: new adjacency
002889: .Mar  3 22:33:58.434: %LINK-3-UPDOWN: Interface Tunnel100, changed state to up
002890: .Mar  3 22:33:58.438: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel100, changed state to up
002891: .Mar  3 22:34:15.370: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 192.168.5.66 (Tunnel100) is up: new adjacency
002892: .Mar  3 22:34:30.551: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.17.205.61 (Tunnel1) is down: holding time expired
002893: .Mar  3 22:34:47.015: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to down

The IPSec tunnel to the Branch Office remains in place throughout.

Can anyone help!???

1 Accepted Solution

Accepted Solutions

adrrodri
Level 1
Level 1

The problem was because whenever the GRE Tunnel establishes between the Internal and the DMZ Routers and an EIGRP neighbor forms the Branch Office was learning the next hop to the tunnel destination from a different device.

This is how the Branch Office was learning the route to get to the tunnel destination:

interface Tunnel1

 description Tunnel to Tandragee Sub Station VPN Router

 bandwidth 64

 ip address 172.17.205.62 255.255.255.252

 no ip route-cache cef

 delay 20000

 keepalive 10 3

 tunnel source Loopback1

 tunnel destination 172.17.255.23

se-idz-vpn-01#sh ip route  172.17.255.23

Routing entry for 172.17.255.23/32

  Known via "static", distance 1, metric 0

  Routing Descriptor Blocks:

  * 172.17.252.129

      Route metric is 0, traffic share count is 1

se-idz-vpn-01#sh ip route 172.17.252.129

Routing entry for 172.17.252.128/25

  Known via "connected", distance 0, metric 0 (connected, via interface)

  Routing Descriptor Blocks:

  * directly connected, via GigabitEthernet0/1

      Route metric is 0, traffic share count is 1

se-idz-vpn-01#

This is how the next hop was being learned when GRE Tunnel establishes between the Internal and the DMZ Routers

se-idz-vpn-01#sh ip route 172.17.252.129

Routing entry for 172.17.252.128/27

  Known via "eigrp 1", distance 170, metric 40258816, type external

  Redistributing via eigrp 1

  Last update from 192.168.5.66 on Tunnel100, 00:07:25 ago

  Routing Descriptor Blocks:

  * 192.168.5.66, from 192.168.5.66, 00:07:25 ago, via Tunnel100

      Route metric is 40258816, traffic share count is 1

      Total delay is 10110 microseconds, minimum bandwidth is 64 Kbit

      Reliability 255/255, minimum MTU 1476 bytes

      Loading 1/255, Hops 2

We can see how the next hop to get to the tunnel destination 172.17.255.23 changed from Known via "connected" via GigabitEthernet0/1 to Known via "eigrp 1" via Tunnel100.

This case causing the Tunnel 1 drops.

The reason for this behavior was because the route to get to the next-hop was being learned with a longest match via the tunnel interface therefore it was winning the race to get installed in the routing table.

Solution we applied:

Created a distribute list on the Branch Office router in order to drop this specific route updated from Tunnel 100.

router eigrp 1

 distribute-list 1 in

 network 10.10.10.0 0.0.0.3

 network 172.17.203.56 0.0.0.3

 network 172.17.203.60 0.0.0.3

 network 172.17.205.60 0.0.0.3

 network 172.19.98.18 0.0.0.0

 network 192.168.5.64 0.0.0.3

 passive-interface Loopback1

se-idz-vpn-01#sh access-list 1

Standard IP access list 1

    10 deny   172.17.252.128, wildcard bits 0.0.0.127 (1 match)

    20 permit any (1230 matches)

se-idz-vpn-01#

Once this was applied we were able to have the GRE Tunnel establishes between the Internal and the DMZ Routers along with the GRE tunneld between the Branch Office and the DMZ router.

View solution in original post

10 Replies 10

Marcin Latosiewicz
Cisco Employee
Cisco Employee

Tony,

Did anyone look into that already?

Since no encryption is there I would check for

1) Tunnel keepalives configured

2) Routing - under normal circumstances tunnel interface line protocol goes down if tunnel destination is not reachable (if it's a recursive routing case a appropriate message is expected)  or when there's a problem with tunnel source.

Can you check also logs on internal router for that tunnel?

What I would do is to first bring up the tunnels without EIGRP or any othe dynamic routing protocol and check if they are stable (run an SLA between tunnel interfaces?)

If we see the instability with RP on - well it means most likely you're redistributing/advertising too much into the tunnels.

Marcin

Hi Marcin,

Thank you for your response. No, you are the first to reply in regards to this issue.

Yes, tunnel keepalives are configured.

Secondly - I believe there is a recursive routing problem. If I disable dynamic routing on the LAN interface of the internal router the tunnels maintain and stay stable.

However i must enable dynamic routing so to maintian automatic failover via the internal and external GRE tunnels when the main route to the branch office fails.

I have attached three log files - results of a "debug tunnel" whenever I enable dynamic routing on the internal router. If you require any other outputs please let me know.

This is very strange too. If I disable dynamic routing on the Branch Router on all interfaces EXCEPT  the 172.17.205.60/30 subnet - the issue still occurs. Is there some kind of recursive routing issue within the tunnels themselves?

Internal Router

****************

!

interface Loopback1
ip address 172.17.255.4 255.255.255.255
!

interface Tunnel1
description Internal Tunnel to DMZ Router
bandwidth 64
ip address 192.168.5.66 255.255.255.252
delay 1000
keepalive 10 3
tunnel source Loopback1
tunnel destination 172.17.255.3
end

DMZ Router

*************

!

interface Loopback1
ip address 172.17.255.3 255.255.255.255
!

interface Tunnel1
description Tunnel to Branch Router

bandwidth 64
ip address 172.17.205.62 255.255.255.252
delay 20000
keepalive 10 3
tunnel source Loopback1
tunnel destination 172.17.255.23
!

interface Tunnel100
description Internal Tunnel to Internal Router
bandwidth 64
ip address 192.168.5.65 255.255.255.252
delay 1000
keepalive 10 3
tunnel source Loopback1
tunnel destination 172.17.255.4
!

Branch Router

*****************

interface Loopback1
ip address 172.17.255.23 255.255.255.255
!
interface Tunnel1
description Backup Tunnel to DMZ Router
bandwidth 64
ip address 172.17.205.61 255.255.255.252
delay 20000
keepalive 10 3
tunnel source Loopback1
tunnel destination 172.17.255.3

!

Tony,

I didn't have time to go over the outputs.

What I would concentrate on is comparing routing before and after dynamic routing is applied.

First of all we should not learn about tunnel endpoint over the tunnel (with prefered metric)

Indeed from debugs it looks like tunnel is unidirectional at some point:

2011-03-07 17:00:46    Local7.Debug    172.17.255.23    188: 000234: *Mar  4 02:29:14.179: Tunnel1: GRE/IP to classify 172.17.255.3->172.17.255.23 (tbl=0,"Default" len=148 ttl=254 tos=0xC0)
2011-03-07 17:00:46    Local7.Debug    172.17.255.23    189: 000235: *Mar  4 02:29:14.179: Tunnel1: GRE/IP (PS) to decaps 172.17.255.3->172.17.255.23 (tbl=0,"default" len=148 ttl=254)
2011-03-07 17:00:46    Local7.Debug    172.17.255.23    190: 000236: *Mar  4 02:29:14.179: Tunnel1: GRE decapsulated IP packet (linktype=7, len=124)
2011-03-07 17:00:46    Local7.Debug    172.17.255.23    191: 000237: *Mar  4 02:29:14.183: Tunnel1: GRE/IP encapsulated 172.17.255.23->172.17.255.3 (linktype=7, len=64)
2011-03-07 17:00:46    Local7.Debug    172.17.255.23    192: 000238: *Mar  4 02:29:14.183: Tunnel1 count tx, adding 0 encap bytes
2011-03-07 17:00:47    Local7.Debug    172.17.255.23    193: 000239: *Mar  4 02:29:15.603: Tunnel1: GRE/IP encapsulated 172.17.255.23->172.17.255.3 (linktype=7, len=84)
2011-03-07 17:00:47    Local7.Debug    172.17.255.23    194: 000240: *Mar  4 02:29:15.603: Tunnel1 count tx, adding 0 encap bytes

Is this really a case of recursive routing hard to say, but it looks like the keepalivs are killing the tunnel (or tunnel destination IP is removed from routing table).

Can you check for me routing for tunnel endpoints (on all three routers for both of the other routers) before and after routing protocol is applied?

Marcin

Marcin,

Here is the output on each router before and after dynamic routing is enabled. I have ensured that static routes are in place between each tunnel endpoint so to reduce the possibility of recursive routing.

BEFORE ROUTING IS ENABLED ON LAN INTERFACE OF INTERNAL ROUTER


internal#sh ip route 172.17.255.3
Routing entry for 172.17.255.3/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 172.17.0.45                                                   <
      Route metric is 0, traffic share count is 1!


!

ip route 172.17.255.3 255.255.255.255 172.17.0.45
!

AFTER

internal#sh ip route 172.17.255.3
Routing entry for 172.17.255.3/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 172.17.0.45
      Route metric is 0, traffic share count is 1


*************************************************

BEFORE


dmz#sh ip route 172.17.255.4
Routing entry for 172.17.255.4/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 172.17.251.129                                                  <
      Route metric is 0, traffic share count is 1

dmz#sh ip route 172.17.255.23
Routing entry for 172.17.255.23/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 172.17.252.129                                                  <
      Route metric is 0, traffic share count is 1

ip route 172.17.255.4 255.255.255.255 172.17.251.129
ip route 172.17.255.23 255.255.255.255 172.17.252.129

AFTER

dmz#sh ip route 172.17.255.4
Routing entry for 172.17.255.4/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 172.17.251.129
      Route metric is 0, traffic share count is 1

dmz#sh ip route 172.17.255.23
Routing entry for 172.17.255.23/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 172.17.252.129
      Route metric is 0, traffic share count is 1

*************************************************

BEFORE

branch#sh ip route 172.17.255.3
Routing entry for 172.17.255.3/32
  Known via "static", distance 1, metric 0 (connected)
  Routing Descriptor Blocks:
  * directly connected, via Dialer101
      Route metric is 0, traffic share count is 1

!
ip route 172.17.255.3 255.255.255.255 Dialer101

AFTER

branch#sh ip route 172.17.255.3
Routing entry for 172.17.255.3/32
  Known via "static", distance 1, metric 0 (connected)
  Routing Descriptor Blocks:
  * directly connected, via Dialer101
      Route metric is 0, traffic share count is 1

I am also noticing that the IPSec tunnel fails when dynamic routing is enabled:

dmz#sh crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
172.17.252.138  <>  QM_IDLE           4067 ACTIVE

Even though the IPSec tunnel appears to remain up on the HQ DMZ Router (above), the branch end of the IPSec tunnel fails (below).

branch#

branch#sh crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
    <>  MM_NO_STATE          0 ACTIVE (deleted)

Tony,

If you have isakmp keepalives and there's a routing problem the IPsec tunnels will fail.

I understand that DMZ 172.17.251.129, 172.17.252.129 and internal 172.17.0.45 belong to connected interfaces? (or is recursive routing involved) :-)

Did you  (by any chance) try disabling CEF and checking - we can do it on interface level "no ip route-cache cef"?

There's only some much I can help out with while not seeing the behavior life. Can you maybe open up a TAC case (you can do it from forums now)?

Marcin

Hi Marcin,

Yes - 172.17.251.129 is the external interface of the Internal DMZ Cluster. 172.17.252.129 is the internal interface of the external DMZ Cluster. The DMZ Routers sits between the two clusters.

I have tried disabling CEF - but to no avail.

I really appreciate your help though - its good to know that what I've been checking up until now has been correct and to be able to bounce ideas of an expert is very beneficial.

I will raise a TAC now and keep you posted on my progress.

If you think of anything else please let me know.

Regards

Tony

Tony,

Once you have the SR number can you please provide it here?

I'll followup internally with whoever pick it up.

Marcin

SR: 617075673

adrrodri
Level 1
Level 1

The problem was because whenever the GRE Tunnel establishes between the Internal and the DMZ Routers and an EIGRP neighbor forms the Branch Office was learning the next hop to the tunnel destination from a different device.

This is how the Branch Office was learning the route to get to the tunnel destination:

interface Tunnel1

 description Tunnel to Tandragee Sub Station VPN Router

 bandwidth 64

 ip address 172.17.205.62 255.255.255.252

 no ip route-cache cef

 delay 20000

 keepalive 10 3

 tunnel source Loopback1

 tunnel destination 172.17.255.23

se-idz-vpn-01#sh ip route  172.17.255.23

Routing entry for 172.17.255.23/32

  Known via "static", distance 1, metric 0

  Routing Descriptor Blocks:

  * 172.17.252.129

      Route metric is 0, traffic share count is 1

se-idz-vpn-01#sh ip route 172.17.252.129

Routing entry for 172.17.252.128/25

  Known via "connected", distance 0, metric 0 (connected, via interface)

  Routing Descriptor Blocks:

  * directly connected, via GigabitEthernet0/1

      Route metric is 0, traffic share count is 1

se-idz-vpn-01#

This is how the next hop was being learned when GRE Tunnel establishes between the Internal and the DMZ Routers

se-idz-vpn-01#sh ip route 172.17.252.129

Routing entry for 172.17.252.128/27

  Known via "eigrp 1", distance 170, metric 40258816, type external

  Redistributing via eigrp 1

  Last update from 192.168.5.66 on Tunnel100, 00:07:25 ago

  Routing Descriptor Blocks:

  * 192.168.5.66, from 192.168.5.66, 00:07:25 ago, via Tunnel100

      Route metric is 40258816, traffic share count is 1

      Total delay is 10110 microseconds, minimum bandwidth is 64 Kbit

      Reliability 255/255, minimum MTU 1476 bytes

      Loading 1/255, Hops 2

We can see how the next hop to get to the tunnel destination 172.17.255.23 changed from Known via "connected" via GigabitEthernet0/1 to Known via "eigrp 1" via Tunnel100.

This case causing the Tunnel 1 drops.

The reason for this behavior was because the route to get to the next-hop was being learned with a longest match via the tunnel interface therefore it was winning the race to get installed in the routing table.

Solution we applied:

Created a distribute list on the Branch Office router in order to drop this specific route updated from Tunnel 100.

router eigrp 1

 distribute-list 1 in

 network 10.10.10.0 0.0.0.3

 network 172.17.203.56 0.0.0.3

 network 172.17.203.60 0.0.0.3

 network 172.17.205.60 0.0.0.3

 network 172.19.98.18 0.0.0.0

 network 192.168.5.64 0.0.0.3

 passive-interface Loopback1

se-idz-vpn-01#sh access-list 1

Standard IP access list 1

    10 deny   172.17.252.128, wildcard bits 0.0.0.127 (1 match)

    20 permit any (1230 matches)

se-idz-vpn-01#

Once this was applied we were able to have the GRE Tunnel establishes between the Internal and the DMZ Routers along with the GRE tunneld between the Branch Office and the DMZ router.