Solved: Re: What does ICMP message (11,1) mean? - Page 2

jgtheodor · ‎12-03-2009

Hi,

I am working in a DMVPN environment with two HUB and 25 Spoke routers. There are mGRE tunnels everywhere with the same basic configuration. There are also attached in WAN Serial & ADSL interfaces Extended Access Lists permitting only the esp and ISAKMP (udp 500) packets. Every day in the Primary HUB router I see the following log messages:

Dec 03 08:52:57 172.16.250.2 2528762: Dec 3 08:52:44.143: %SEC-6-IPACCESSLOGDP: list WAN denied icmp 10.195.35.30 -> 192.168.192.1 (11/1), 13 packets
Dec 03 08:52:57 172.16.250.2 2528763: Dec 3 08:52:44.143: %SEC-6-IPACCESSLOGDP: list WAN denied icmp 10.195.35.26 -> 192.168.192.1 (11/1), 8 packets
Dec 03 08:52:57 172.16.250.2 2528764: Dec 3 08:52:44.143: %SEC-6-IPACCESSLOGDP: list WAN denied icmp 10.195.35.82 -> 192.168.192.1 (11/1), 1 packet
Dec 03 08:53:57 172.16.250.2 2528765: Dec 3 08:53:44.148: %SEC-6-IPACCESSLOGDP: list WAN denied icmp 10.195.35.78 -> 192.168.192.1 (11/1), 8 packets

The source IP Addresses are the WAN IP addresses of all Spoke routers and the IP address 192.168.192.1 is the Loopback IP address of Primary HUB router. Similar log messages I see in every Spoke router, with source IP Address the Primary HUB WAN Interface and destination IP Addresses the Loopback IP Addresses of all other Spoke routers. As far I know there is no any fragmentation issue, and everything works fine. But the answer remains:

Where these ICMP packets come from?

Can anyone help me answer this question?

Thanks in advance!

marikakis · ‎12-07-2009

Yes, that's my understanding too. That the ACL log messages and the VPN hardware errors point to fragmentation. That's the only explanation that makes sense to me and puts all the pieces of the discussion together. Reassembly timeouts are caught by ACL and fragmentation is a possibility with those VPN hardware errors. What one considers cosmetic depends. You might not see severe traffic disruption, but there might be actually lost packets in the tunnel when reassembly timeouts occur (maybe because the ACL denies them, maybe anyway). TCP traffic might start with some segments, loses few of them, retransmits and eventually adjusts to the network. This might explain why you see the messages every now and then, especially at higher traffic hours. Also, have in mind that in the first link I posted, it says that fragmentation in the tunnel is possible even if original host sets the DF bit. As the last link I posted suggests, you can either reduce the mss a little bit and/or set manually the DF bit for the tunnel.

Edit: I didn't say clearly that, if any issue exists here, it's not so much a few lost packets, since this is how TCP works anyway, but a potential of a small performance degradation for the routers every now and then depending on the frequency of the messages. And even if this isn't a real issue for your network, we knew that from the beginning based on what you said. Maybe the most serious issue here is to make those messages go away for good so we can all feel better, since nobody likes a router that complains all the time, even if it complains for nothing! Those log messages make the log harder to read and you might miss other issues in the future.

marikakis · ‎12-09-2009

Hi John,

I think you also need to explore if the EIGRP protocol timeouts/goodbyes found in the logs are also associated with this issue. I forgot to ask on a previous post of mine and maybe thought those could be attributed to other unrelated network issues. However, now that I think about it more, I disagree with myself! If the routing protocol timeouts cannot be explained in any other way, then your issue is more serious than a few cosmetic or annoying logs, even if the interruption is usually a couple of seconds (sometimes its more and hold timer takes even more to expire).

Kind Regards,

Maria

Message was edited by: marikakis

jgtheodor · ‎12-09-2009

Yeap,

Maria you are absolutely right. I believe that these EIGRP timeouts are also associated with this issue. I have checked it with our ISP network guys and the routing protocol timeouts cannot be explained at all. They say that there is no any Layer 2 Issue in the MPLS backbone. I have also opened a TAC case in Cisco for this specific issue (3 in total), and remains unanswered. After many EIGRP & Tunnel debugs - no result.

Anyway trying to troubleshoot the issue and after your suggestion, I was wondering if the CEF per packet load-sharing in my HUB router leads to this phenomenon. I am sending you the HUB router's configuration file to take a look. I think that there is no any configuration error.

If you have any suggestion for further troubleshooting, please proceed.

I am still trying to resolve this issue with or without Cisco's help!

marikakis · ‎12-09-2009

Before looking at the config, may I ask if this is a Greek ISP and which one?

jgtheodor · ‎12-09-2009

Yeap this is a Greek ISP, OTE.

marikakis · ‎12-09-2009

If it had been another provider, I could ask some backbone guys directly about any backbone issues. Anyway, I will trust OTE on this one, since your router is already complaining about other things as well, and they all look closely related. I am not very familiar with DMVPN as I have already said at the beginning. When I see some trendy VPN product, I tend to think this is yet another VPN flavor with MTU issues. What happened to the good old leased lines? I do not see any problem in your configuration, and for the DMVPN details I trust your setup wouldn't work at all if it was wrong. The only suggestion I have right now is to reduce the MSS a little bit (say 20 bytes down). Maybe your router sometimes forwards packets that are a little larger than expected (options in headers or anything) and/or maybe MSS adjustment doesn't work well on the tunnels. You can either reduce it on the tunnel for starters or on the LAN interfaces. By the way, what is the MTU of the serial interfaces? Can't think of specific problem with it right now, but with bugs you never know, so let's see how much it is.

jgtheodor · ‎12-09-2009

Mary sorry for the delay...

The Serial interface MTU is 1500 bytes. Tomorrow I will try to reduce the MSS 40 bytes down and I will update the Conversation accordingly!

Thanks in advance for your interesting again and again

marikakis · ‎12-09-2009

I must look very patient, but the truth is that I want the issue resolved months ago! I have to catch a flight tomorrow and will be away for almost 5 days. If cisco gives you an answer in the meantime, it will be really unfair!

marikakis · ‎12-14-2009

Hello again,

I can't believe I missed your comment about CEF per-packet load-sharing. That's how issues remain unresolved . Anyway, per-packet load-sharing can cause packet reordering issues, which might be unacceptable for certain types of traffic (such as VoIP). However, in this case we are trying to see why fragmentation occurs in the first place (and clues up to now indicate that it does). Any behavior after fragmentation occurs is expected not to be very elegant. If we discover what causes the fragmentation and avoid it, then maybe you will discover that the load-balancing method does not affect your traffic very much (hopefully, since you probably needed per-packet load-sharing when you decided to enable it). In any case, with bugs you never know, so I won't insist very much at this point.

Kind Regards,

Maria

marikakis · ‎12-14-2009

Hi John,

I am back and curious about how things are going in this case of yours. Have you managed to resolve the issue?

Kind Regards,

Maria

jgtheodor · ‎12-14-2009

Hi Mary,

Welcome back in chaos … For the time being, the “problem” remains unresolved. Unfortunately I did not have the time to change the MSS value in Tunnel Interfaces as you suggested in a previous post, but I have asked from ISP to escalate the relevant ticket updating it with our conversation and I am waiting the official answer from Cisco to proceed accordingly (the damned configuration change policy ).

Regarding the per packet-load-sharing, as you mentioned above, I decided to enable it because I need it for the QoS to work fine in two Serial interfaces in case one of those failure.

However, I will try unofficially one of the next 5-6 days in non-working hours, to change the MSS value to see finally if this makes the difference. In any case I will update the Conversation.

Good day!!!

marikakis · ‎12-15-2009

Kalimera ,

It's good you decided to escalate the issue with your ISP. In such cases, tactics can help resolve issues better than knowledge and debugging. Your ISP has access to information that you do not. As for the case with cisco, I believe you will get better results if all possibly related logs are viewed combined and not in isolation. If an engineer sees just the routing protocol go down for example, this on its own doesn't say much (could have occured for various reasons). In the end, maybe changing the IOS as you suggested in a previous post might be your only chance of getting issue resolved.

Kind Regards,

Maria

marikakis · ‎12-15-2009

Hi John,

I am more optimistic about possible resolution after MSS adjustment after reading the following thread:

https://supportforums.cisco.com/thread/202379

In that thread someone mentions the magic numbers 1394 for MTU and 1354 for MSS (this '4' makes me recall the 1504 size I noticed in one of your logs with hardware errors that I copy-pasted in this thread). Another person says some MSS tuning might be needed in any specific occassion. Also, people tend to tune the MSS on the LAN side closer to the source of packets. So, when you get the opportunity to make any changes, have in mind that some tuning might be needed, and try to monitor things at the time you attempt the changes to avoid revisiting the issue at a later time.

Good Luck,

Maria

jgtheodor · ‎12-15-2009

Mary thank you very mach indeed ...

I will read the relevant thread and I will inform you asap for any change.

jgtheodor · ‎12-16-2009

Well,

Regarding the issue, I am pinging the remote branch DC Server (172.16.22.51) from my PC located in HQ behind from HUB router and after a bit of experimentation, here is what I found:

packet size<= 1372 bytes, no fragmentation required, (ping success).
packet size>= 1373 bytes and <= 1472 bytes, packet is silently dropped (request time out).
packet size>= 1473 bytes, packet needs to be fragmented but DF bit set

The configured values in Tunnel interfaces are MTU 1400 and MSS 1360 in spokes and HUB router and GRE is being used in conjunction with IPsec in transport mode. What does this behavior tell me???

Mary, do you believe that configuring the MSS value in all LAN interfaces at 1320 for example the issue could get resolved?