08-25-2020 07:53 AM
Hi all,
I have a flapping GRE tunnel between 2 C2801 (C2801-ADVIPSERVICESK9-M), Version 12.3(14)T4
Both routers have several other tunnels working fine and the ISP network doesn't seem to have problems (0.35% loss with ping size 1500). The tunnel goes down approximately every hour and turns up in approx 5 mins.
Following are the 2 configs:
R1
interface Tunnel4
description Tunnel XXX
bandwidth 8000
ip address 192.168.104.1 255.255.255.0
ip mtu 1400
ip flow ingress
ip flow egress
ip route-cache flow
ip tcp adjust-mss 1336
ip ospf cost 15
delay 2000
keepalive 1 3
tunnel source FastEthernet0/1
tunnel destination 172.22.77.1
R2
interface Tunnel1
description Tunnel XXX
bandwidth 8000
ip address 192.168.104.7 255.255.255.0
ip mtu 1400
ip route-cache flow
ip tcp adjust-mss 1336
delay 2000
keepalive 1 3
tunnel source FastEthernet0/1
tunnel destination 172.22.8.7
I activated several debugs, but the only useful info that I get is related to the change of state:
008395: .Aug 25 2020 14:11:21.316 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to down
008396: .Aug 25 2020 14:11:26.316 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to up
While using a repeated ping I noticed that during the down time I am not able to reach the other device.
I wander if this issue can be related to one of the 2 ISPs involved or something is wrong with the router config.
Thank you in advance to anybody wanting to help!
Solved! Go to Solution.
08-25-2020 08:24 AM
Hello @CarloCrz ,
as a start I would use less aggressive timers for the tunnel keepalive
>> keepalive 1 3
Hope to help
Giuseppe
08-25-2020 08:24 AM
Hello @CarloCrz ,
as a start I would use less aggressive timers for the tunnel keepalive
>> keepalive 1 3
Hope to help
Giuseppe
08-25-2020 08:26 AM
Tunnel is overlay, So it rely on underlay infrastrucure.
So i will start suggesting Physical port Fas 0/1 to ISP
also monitor out of the Tunnel ping between Interface IP see any packet Drops
what is the utilisation of this port and what is CPU level when the link go down up ?
08-25-2020 09:17 AM
08-26-2020 10:57 AM
We have only minimal information here. You tell us that each router has several tunnels. Are all of the tunnels sourced from the same physical interface (Fast0/1)?
When you get a tunnel interface state change message on one router, do you also get a similar message on the tunnel peer router at about the same time? Or does some times one router report the tunnel is down but the other router continues to believe that the tunnel is up?
I agree with the suggestion about relaxing the tunnel keep alive timers.
It may not be significant but I notice something in your post. You tell us that "turns up in approx 5 mins."
It might be that normally it comes back up in 5 minutes, but the log messages you post do not show that
008395: .Aug 25 2020 14:11:21.316 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to down
008396: .Aug 25 2020 14:11:26.316 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to up
This shows recovery in 5 seconds.
08-27-2020 12:51 AM
Hi Richard,
you are right, the down period is about 5 seconds, not minutes. After the first answers I decided to follow @Giuseppe Larosa's suggestion and I increased the timers (now 5 4), since this I didn't had any flapping.
All of the tunnels are using the same physical interface and I can see the tunnel going up ad down from both devices.
Answering to @Joseph W. Doherty, I checked the bandwidths and utilization and there is no problem with that. Regarding the IP MTU and IP TCP adjust-mss values, I inherited this infrastructure and unfortunately I don't know the design reasons.
At this point, I will probably adjust the timers (maybe 3 4) and go ahead by using this tunnel. I suppose that both ISPs have some delay and cause some loss, so since I use them together it happens to lose 2-3 ping in a row.
Thank you to everyone
08-27-2020 08:29 AM - edited 08-28-2020 07:03 AM
"I checked the bandwidths and utilization and there is no problem with that."
Checked how? Reasons I ask are, first, something like microbursts can be difficult to capture/see with the "usual" networking monitoring tools, and second, you were (apparently) having tunnel drops due to keep alives lost over several seconds, and now, with longer keep alive timers, you see ". . . lose 2-3 ping in a row." I.e. although tunnel flaps are "fixed", you may still have a problem (first shown by tunnel flaps).
"Regarding the IP MTU and IP TCP adjust-mss values, I inherited this infrastructure and unfortunately I don't know the design reasons."
Ah, in that case, at some point you may want to analyze them and perhaps revise them. A good place to start for how to set up tunnels, optimally, is this Cisco TechNote: https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html (Although the TechNote discusses much about IPSec tunnels, much of it applies to GRE tunnels too.)
08-28-2020 01:35 AM
The tunnel's actual use is almost null, we have a mesh topology and this one never reaches 1 Mbps from what I see on the show int Tu4 output. The physical WAN connection is 15/15 on both sides of the tunnel.
Unfortunately the link you sent me is 404 at the moment but yes, I will have to analyze those values and for sure the connection is not without issue. Do you think that by configuring IP MTU Discovery and maybe setting different IP MTU and IP TCP adjust-mss values we could obtain a more reliable connection?
08-28-2020 07:08 AM - edited 08-28-2020 07:09 AM
"Unfortunately the link you sent me is 404 . . ."
That might be fixed, now. (Cisco revised their site [?]. I had pasted link directly into note, also, when I just tried, I also got 404. Revised to use "insert link" option. Now, it works for me.)
"Do you think that by configuring IP MTU Discovery and maybe setting different IP MTU and IP TCP adjust-mss values we could obtain a more reliable connection?"
No I don't think that would help make the connection more reliable, although that should use it more optimally.
If your WAN bandwidths are 15/15, you be better off shaping, but multipoint connections are a problem to shape for unless you use something like DMVPN with the later dynamic shaping feature.
08-31-2020 06:02 AM
It works now, I confirm. Thank you for all of your advices, I will apply them.
08-27-2020 02:53 AM
Thank you for the input and looks like changing timers fix. good to know
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide