Solved: Re: Flapping GRE Tunnel Interface

CarloCrz · ‎08-25-2020

Hi all,

I have a flapping GRE tunnel between 2 C2801 (C2801-ADVIPSERVICESK9-M), Version 12.3(14)T4

Both routers have several other tunnels working fine and the ISP network doesn't seem to have problems (0.35% loss with ping size 1500). The tunnel goes down approximately every hour and turns up in approx 5 mins.

Following are the 2 configs:

R1

interface Tunnel4

description Tunnel XXX

bandwidth 8000

ip address 192.168.104.1 255.255.255.0

ip mtu 1400

ip flow ingress

ip flow egress

ip route-cache flow

ip tcp adjust-mss 1336

ip ospf cost 15

delay 2000

keepalive 1 3

tunnel source FastEthernet0/1

tunnel destination 172.22.77.1

R2

interface Tunnel1

description Tunnel XXX

bandwidth 8000

ip address 192.168.104.7 255.255.255.0

ip mtu 1400

ip route-cache flow

ip tcp adjust-mss 1336

delay 2000

keepalive 1 3

tunnel source FastEthernet0/1

tunnel destination 172.22.8.7

I activated several debugs, but the only useful info that I get is related to the change of state:

008395: .Aug 25 2020 14:11:21.316 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to down

008396: .Aug 25 2020 14:11:26.316 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to up

While using a repeated ping I noticed that during the down time I am not able to reach the other device.

I wander if this issue can be related to one of the 2 ISPs involved or something is wrong with the router config.

Thank you in advance to anybody wanting to help!

Giuseppe Larosa · ‎08-25-2020

Hello @CarloCrz ,

as a start I would use less aggressive timers for the tunnel keepalive

>> keepalive 1 3

Hope to help

Giuseppe

View solution in original post

Giuseppe Larosa · ‎08-25-2020

Hello @CarloCrz ,

as a start I would use less aggressive timers for the tunnel keepalive

>> keepalive 1 3

Hope to help

Giuseppe

balaji.bandi · ‎08-25-2020

Tunnel is overlay, So it rely on underlay infrastrucure.

So i will start suggesting Physical port Fas 0/1 to ISP

also monitor out of the Tunnel ping between Interface IP see any packet Drops

what is the utilisation of this port and what is CPU level when the link go down up ?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Joseph W. Doherty · ‎08-25-2020

One possibility that comes to mind, what's the actual bandwidth supported/guaranteed for this tunnel? Reason I ask, if your tunnel's transit bandwidth is less than the physical port bandwidths, burst congestion along the tunnel's path might cause lost of the tunnel's keep alives, and then the tunnel would go down.

BTW, how did you come to chose the IP MTU and IP TCP adjust-mss values?

Also BTW, you might consider enabling IP MTU discovery.

Richard Burts · ‎08-26-2020

We have only minimal information here. You tell us that each router has several tunnels. Are all of the tunnels sourced from the same physical interface (Fast0/1)?

When you get a tunnel interface state change message on one router, do you also get a similar message on the tunnel peer router at about the same time? Or does some times one router report the tunnel is down but the other router continues to believe that the tunnel is up?

I agree with the suggestion about relaxing the tunnel keep alive timers.

It may not be significant but I notice something in your post. You tell us that "turns up in approx 5 mins."

It might be that normally it comes back up in 5 minutes, but the log messages you post do not show that

008395: .Aug 25 2020 14:11:21.316 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to down

008396: .Aug 25 2020 14:11:26.316 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to up

This shows recovery in 5 seconds.

HTH

Rick

CarloCrz · ‎08-27-2020

Hi Richard,

you are right, the down period is about 5 seconds, not minutes. After the first answers I decided to follow @Giuseppe Larosa's suggestion and I increased the timers (now 5 4), since this I didn't had any flapping.

All of the tunnels are using the same physical interface and I can see the tunnel going up ad down from both devices.

Answering to @Joseph W. Doherty, I checked the bandwidths and utilization and there is no problem with that. Regarding the IP MTU and IP TCP adjust-mss values, I inherited this infrastructure and unfortunately I don't know the design reasons.

At this point, I will probably adjust the timers (maybe 3 4) and go ahead by using this tunnel. I suppose that both ISPs have some delay and cause some loss, so since I use them together it happens to lose 2-3 ping in a row.

Thank you to everyone

Joseph W. Doherty · ‎08-27-2020

"I checked the bandwidths and utilization and there is no problem with that."

Checked how? Reasons I ask are, first, something like microbursts can be difficult to capture/see with the "usual" networking monitoring tools, and second, you were (apparently) having tunnel drops due to keep alives lost over several seconds, and now, with longer keep alive timers, you see ". . . lose 2-3 ping in a row." I.e. although tunnel flaps are "fixed", you may still have a problem (first shown by tunnel flaps).

"Regarding the IP MTU and IP TCP adjust-mss values, I inherited this infrastructure and unfortunately I don't know the design reasons."

Ah, in that case, at some point you may want to analyze them and perhaps revise them. A good place to start for how to set up tunnels, optimally, is this Cisco TechNote: https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html (Although the TechNote discusses much about IPSec tunnels, much of it applies to GRE tunnels too.)

CarloCrz · ‎08-28-2020

The tunnel's actual use is almost null, we have a mesh topology and this one never reaches 1 Mbps from what I see on the show int Tu4 output. The physical WAN connection is 15/15 on both sides of the tunnel.

Unfortunately the link you sent me is 404 at the moment but yes, I will have to analyze those values and for sure the connection is not without issue. Do you think that by configuring IP MTU Discovery and maybe setting different IP MTU and IP TCP adjust-mss values we could obtain a more reliable connection?

Joseph W. Doherty · ‎08-28-2020

"Unfortunately the link you sent me is 404 . . ."

That might be fixed, now. (Cisco revised their site [?]. I had pasted link directly into note, also, when I just tried, I also got 404. Revised to use "insert link" option. Now, it works for me.)

"Do you think that by configuring IP MTU Discovery and maybe setting different IP MTU and IP TCP adjust-mss values we could obtain a more reliable connection?"

No I don't think that would help make the connection more reliable, although that should use it more optimally.

If your WAN bandwidths are 15/15, you be better off shaping, but multipoint connections are a problem to shape for unless you use something like DMVPN with the later dynamic shaping feature.

CarloCrz · ‎08-31-2020

It works now, I confirm. Thank you for all of your advices, I will apply them.

balaji.bandi · ‎08-27-2020

Thank you for the input and looks like changing timers fix. good to know

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help