cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4900
Views
0
Helpful
3
Replies

IPSec tunnel w NAT-T hangs on QM-renegotiation and does not recover.

Svante Bolander
Level 1
Level 1

I have struggled for some time with DMVPN tunnels hanging and not recovering. The tunnels are part of a DMVPN where some spokes ara behind NAT/PAT and som are not. The problems only occur for spokes behind NAT. At all times there are no problems initiating the tunnels, they come up perfect and OSPF forms adjajency in them between the hub amd every spoke. Full connectivity can last for several hours but at some time something is happening in the transport network and the tunnel goes down and do not recover. A way to reestablish the tunnel is to change the IP address(behind NAT) on the actual spoke.

The transport network is a mobile network from Tele2 with a Huawei B593 4G router as CPE. The problem can be seen on other Mobile networks also and with other brands of CPE(3g or 4g), when they use NAT, so it are not specific for this type of CPE. But some kind of NAT and a mobile network is always present for the hangs to occur.

I have now isolated this and set up a separate DMVPN to debug what is going on. A 1841 as spoke (behind B593) and a 2851 as hub. Both running 15.1(4)M3.

I can see the tunnel establish correct. I can see QM renegotiate when life-time expires. Also MM renegotiation can bee seen. But at some time it seems that downstream packets are lost, first clue is OSPF going DOWN, DPD on spoke then detects a loss of peer and then MM starts renegotiating. Even that traffic is going both ways the tunnels refuses to reestablish.

To get help with analyzes I have collected debug crypto isakmp/ipsec from both the hub and the spoke, when the tunnel is OK and when it is broken. Please look in the attached files. Any suggestions are appreciated.

----- some debug on the spoke when problems is present, more is in attached files ------

Mar 27 14:40:51.080: ISAKMP (1197): received packet from 12.34.56.2 dport 500 sport 500 Global (I) MM_KEY_EXCH

Mar 27 14:40:51.080: ISAKMP:(1197): phase 1 packet is a duplicate of a previous packet.

Mar 27 14:40:51.080: ISAKMP:(1197): retransmitting due to retransmit phase 1

Mar 27 14:40:51.580: ISAKMP:(1197): retransmitting phase 1 MM_KEY_EXCH...

Mar 27 14:40:51.580: ISAKMP (1197): incrementing error counter on sa, attempt 4 of 5: retransmit phase 1

Mar 27 14:40:51.580: ISAKMP:(1197): retransmitting phase 1 MM_KEY_EXCH

Mar 27 14:40:51.580: ISAKMP:(1197): sending packet to 12.34.56.2 my_port 4500 peer_port 4500 (I) MM_KEY_EXCH

Mar 27 14:40:51.580: ISAKMP:(1197):Sending an IKE IPv4 Packet.

Mar 27 14:41:01.579: ISAKMP:(1197): retransmitting phase 1 MM_KEY_EXCH...

Mar 27 14:41:01.579: ISAKMP (1197): incrementing error counter on sa, attempt 5 of 5: retransmit phase 1

Mar 27 14:41:01.579: ISAKMP:(1197): retransmitting phase 1 MM_KEY_EXCH

Mar 27 14:41:01.579: ISAKMP:(1197): sending packet to 12.34.56.2 my_port 4500 peer_port 4500 (I) MM_KEY_EXCH

Mar 27 14:41:01.579: ISAKMP:(1197):Sending an IKE IPv4 Packet.

Mar 27 14:41:10.418: ISAKMP:(1196):purging SA., sa=68B3EE34, delme=68B3EE34

Mar 27 14:41:10.998: IPSEC(key_engine): request timer fired: count = 2,

  (identity) local= 192.168.1.81:0, remote= 12.34.56.2:0,

    local_proxy= 192.168.1.81/255.255.255.255/47/0 (type=1),

    remote_proxy= 12.34.56.2/255.255.255.255/47/0 (type=1)

Mar 27 14:41:11.462: IPSEC(sa_request): ,

  (key eng. msg.) OUTBOUND local= 192.168.1.81:500, remote= 12.34.56.2:500,

    local_proxy= 192.168.1.81/255.255.255.255/47/0 (type=1),

    remote_proxy= 12.34.56.2/255.255.255.255/47/0 (type=1),

    protocol= ESP, transform= esp-aes  (Transport),

    lifedur= 1000s and 4608000kb,

    spi= 0x0(0), conn_id= 0, keysize= 128, flags= 0x0

Mar 27 14:41:11.462: ISAKMP: set new node 0 to QM_IDLE     

Mar 27 14:41:11.462: ISAKMP:(1197):SA is still budding. Attached new ipsec request to it. (local 192.168.1.81, remote 12.34.56.2)

Mar 27 14:41:11.462: ISAKMP: Error while processing SA request: Failed to initialize SA

Mar 27 14:41:11.462: ISAKMP: Error while processing KMI message 0, error 2.

Mar 27 14:41:11.578: ISAKMP:(1197): retransmitting phase 1 MM_KEY_EXCH...

Mar 27 14:41:11.578: ISAKMP:(1197):peer does not do paranoid keepalives.

Mar 27 14:41:11.578: ISAKMP:(1197):deleting SA reason "Death by retransmission P1" state (I) MM_KEY_EXCH (peer 12.34.56.2)

Mar 27 14:41:11.578: ISAKMP:(1197):deleting SA reason "Death by retransmission P1" state (I) MM_KEY_EXCH (peer 12.34.56.2)

Mar 27 14:41:11.578: ISAKMP: Unlocking peer struct 0x686F6218 for isadb_mark_sa_deleted(), count 0

Mar 27 14:41:11.578: ISAKMP: Deleting peer node by peer_reap for 12.34.56.2: 686F6218

Mar 27 14:41:11.578: %CRYPTO-5-IKMP_SETUP_FAILURE: IKE SETUP FAILED for local:192.168.1.81 local_id:192.168.1.81 remote:12.34.56.2 remote_id:12.34.56.2 IKE profile:None fvrf:None fail_reason:Peer lost fail_class_cnt:1

3 Replies 3

edondurguti
Level 4
Level 4

i have about 10 sites with 3G/4G that run as a backup through DMVPN and they all have the same problem.

Jeff Hansen
Level 1
Level 1

Did you ever find a resolution for this error??

The problem was the 4G-router Huawei B593 which has a bug regarding NAT/PAT. This bug made NAT-T behave in a way that the source ports was changed and that didn´t stop until the NAT-session terminated(= never). Cisco implementaion was not able to recover from this NAT-error, which was on an intermediate NAT-device.