cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
754
Views
0
Helpful
6
Replies

Mysterious Tunnel Drop over MPLS WAN

kevin.hu
Level 3
Level 3

This is a strange problem and I don't know where to begin to troubleshoot. Please help if you have seen this kind of problem before.

We have many sites and they all have a connection to Verizon MPLS network. All these sites have a tunnel connection back to the data center CE router.  We use this tunnel to direct outbound default traffic.  Ever since these tunnels were created, we have tunnel connection drops unexpectedly. Here is the config:

Data Center Head End Tunnel Config:

interface Tunnel1

ip vrf forwarding user

ip address x.x.x.x

no ip redirects

ip nhrp authentication xxxx

ip nhrp map multicast dynamic

ip nhrp network-id 100

keepalive 1 3

tunnel source x.x.x.x

tunnel mode gre multipoint

tunnel key 100

tunnel vrf user

Remote End Tunnel Config:

interface Tunnel1

ip vrf forwarding user

ip address x.x.x.x

ip flow ingress

ip nhrp authentication  xxxx

ip nhrp map

ip nhrp map multicast

ip nhrp network-id 100

ip nhrp nhs

keepalive 1 3

tunnel source x.x.x.x

tunnel destination x.x.x.x

tunnel key 100

tunnel vrf user

I know it is not a config issue because some sites have their tunnel up for weeks while some flapped occationally. In addition, in most part, the CE-PE BGP sessions on both end can be up for weeks but the tunnel can still flap. This leads me to think that the problem is in the MPLS core.  But I don’t have any other evidence beyond my router syslog.  I called Verizon and they told me there was no maintenance during the tunnel flap.  SNMP monitoring can’t catch it because it happens so quickly. How else can I narrow down this problem?

Thanks.

1 Accepted Solution

Accepted Solutions

Raju Sekharan
Cisco Employee
Cisco Employee

Hi Kevin,

You have keepalive configured on the Tunnel interfaces. So if there is any keepalive packet loss on the path, you will see the tunnel going down.

How often is the tunnel flapping? Your keep-alive timer is aggressive(Keepalive 1 3). You can try increasing this  this something like "keepalive 5 4" on the sites which are flapping and check the status

Thanks

Raju

View solution in original post

6 Replies 6

Raju Sekharan
Cisco Employee
Cisco Employee

Hi Kevin,

You have keepalive configured on the Tunnel interfaces. So if there is any keepalive packet loss on the path, you will see the tunnel going down.

How often is the tunnel flapping? Your keep-alive timer is aggressive(Keepalive 1 3). You can try increasing this  this something like "keepalive 5 4" on the sites which are flapping and check the status

Thanks

Raju

Thanks Raju.  I will increase the timer to see if that makes the tunnel to resilient to the Verizon MPLS network.

To answer your question, the tunnel flaps quite frequently and it is totally random.  However, last week all tunels drops at the same time and I suspect Verizon was doing some maintenance work but they said they didn't.

rajs2 wrote:

Hi Kevin,

You have keepalive configured on the Tunnel interfaces. So if there is any keepalive packet loss on the path, you will see the tunnel going down.

Incorrect. Multipoint GRE, as OP is using, do not support keepalives.

Hi Paolo,

Thank you for pointing it out.

I Have a query. Does the keep-alive configs on the spoke sides too doesn't work?

Thanks

Raju

I didn't find any reference about GRE keepalives not supported on Multipoint GRE.  However, I did find that GRE keepalives does not support on VRF interface.  In my case, I do have VRF configured on the GRE interface.

"GRE tunnel keepalive is not supported in cases where virtual route forwarding (VRF) is applied to a GRE tunnel."

Also looking at this past post:

https://supportforums.cisco.com/thread/249607?referring_site=kapi

I will remove my keepalives to see if that fixes the issue.  Thanks for pointing me to a direction where I can try it out.

I should have clarified my post better.

One can configure keepalives on multipoint GRE, but no keepalive will be sent or interface status changed.

The reason is that being it a multipoint interface, there would be no logic to do that.

However, having or not keepalive configuration will not impact any other issue.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card