ESP errors...

Shawn Lebbon · ‎01-21-2005

We have a 1720 router IOS 12.3(9b) that keeps getting errors similar to the following, anywhere from every few minutes to every few hours:

Jan 15 22:46:19.937: %C1700_EM-1-ERROR: packet-rx error: ESP authentication fail, id 110, pool offset 0

Jan 17 04:03:31.936: %C1700_EM-1-ERROR: packet-rx error: ESP sequence fail, id 5, pool offset 0

Jan 18 23:51:12.633: %C1700_EM-1-ERROR: packet-rx error: ESP sequence fail, id 63, pool offset 0

Jan 20 00:04:30.588: %C1700_EM-1-ERROR: packet-rx error: ESP sequence fail, id 58, pool offset 0

Jan 20 00:27:23.966: %C1700_EM-1-ERROR: packet-rx error: ESP sequence fail, id 127, pool offset 0

Jan 20 00:41:44.131: %C1700_EM-1-ERROR: packet-rx error: ESP sequence fail, id 81, pool offset 0

Jan 20 00:43:53.075: %C1700_EM-1-ERROR: packet-rx error: ESP sequence fail, id 107, pool offset 0

Jan 20 01:13:40.573: %C1700_EM-1-ERROR: packet-rx error: ESP sequence fail, id 116, pool offset 0

Jan 20 01:25:25.566: %C1700_EM-1-ERROR: packet-rx error: ESP sequence fail, id 3, pool offset 0

The router is a spoke in a DMVPN, and has nearly identical config to other spokes; which don't seem to have this problem.

The DMVPN links for this router go down every so often for some reason, although it doesn't really correspond to this error. Sometimes they go down for long enough to notice, other-times I think they are simply coming back up quickly enough we don't see it, and those correspond to the other errors. However the tunnel doesn't ALWAYS go down with those above esp errors, as I've been ssh'd into it over the tunnel and had the errors occur. The Hub router shows the following as well with high frequency (every couple hours or more):

Jan 20 13:14:21.220: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.2.21 (Tunnel10) is down: Interface Goodbye received

Jan 20 13:14:26.248: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.2.21 (Tunnel10) is up: new adjacency

Since this is a spoke-to-hub tunnel it should never go down, as the eigrp on it keeps the link 'active'. That and there is other traffic pretty constantly as well what with the Domain Controllers talking, etc.

Everything else in the DMVPN seems to be working well, including the dynamic tunnels from this 'bad' spoke to all the other spokes.

Once in awhile we see the following on the Hub, and the tunnel remains down, until the Spoke is 'kicked over' (shutdown/no shut on eth 0, or reboot):

Jan 20 18:10:50.584: %CRYPTO-4-RECVD_PKT_INV_SPI: decaps: rec'd IPSEC packet has invalid spi for

destaddr=[HUBIPADDRRESS], prot=50, spi=0xC34hj3B4(-1046436460), srcaddr=[SPOKEIPADDRESS]

Jan 20 18:11:03.956: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.2.21 (Tunnel10) is down: holding time expired

Any ideas what the issue might be?

jsivulka · ‎01-28-2005

You might be running into CSCdt95392 - Getting packet-rx error: ESP sequence fail with IMIX traffic. If you are hitting this bug, te solution is to lower the data rate or to use packet sizes that do not require fragmentation.

thisisshanky · ‎01-29-2005

This definitely matches bug CSCdt95392...List of affected versions indicate 12.2 train images. Since you are already running 12.3 i doubt if an upgrade to another version will help. Have you set any MTU on the tunnel interface to this particular spoke.?>

Sankar Nair
UC Solutions Architect
Pacific Northwest | CDW
CCIE Collaboration #17135 Emeritus

Shawn Lebbon · ‎01-31-2005

The MTU set on the DMVPN Tunnel interface is 1436. I just pulled this number out of the example cisco configs.

Shawn Lebbon · ‎01-31-2005

What would be the best way to achieve the stated workaround of "Lowering the data rate or using packet sizes, which do not require fragmentation"?

Is there a 'best' way to set the MTU? Should I look at setting up the "ip tcp adjust-mss" command? I've tried looking around for 'best' values for this thing, but I keep finding differing results. I've seen Cisco example configs for DMVPNs use all sorts of different values from 1400 to 1500...

The WAN connections here are both on T1's if that matters as well. (Although termination of the T1's is done on different routers.)

This DMVPN has already gone into production environment, so it's hard to experiment with new values at this point.

Also, it should be noted that I have only noticed this problem on this spoke router, the other spokes (2 others at this time) don't show this problem. Would this be because the router in question is a 1720 as opposed to 1721 at the other sites? (The bug report just lists c1700 as 'affected' devices.) We may end up switching out this 1720 with another 1721 anyways...but I don't know if it would help. Right now it seems to be the only variable, other than ISP, from the other spokes.