Hi,

administrator125 · ‎08-07-2014

We have a 2 spoke L2L VPN setup whereby an ASA sits at our main site and 1921 routers sit at two remote sites to act as endpoints. All traffic is directed back to the ASA, there is no direct communication between the remote sites.

This all works as expected, but one of the sites periodically loses its tunnel and fails to reestablish it. Clearing crypto/isakmp does not reestablish the SA, we have to actually reload the router at the site in order to get the tunnel active again.

The last time it happened, I was able to grab some information before it was reloaded. Unfortunately, this happened during a heavy production period so I had limited time to grab what I could since getting it alive again was the immediate priority. I'm hoping for a failure during off hours where I can sit down and run some debugging, but that hasn't happened recently.

Site1-1921#show crypto session detail
Crypto session current status

Code: C - IKE Configuration mode, D - Dead Peer Detection     
K - Keepalives, N - NAT-traversal, T - cTCP encapsulation     
X - IKE Extended Authentication, F - IKE Fragmentation

Interface: Serial0/0/0
Session status: DOWN-NEGOTIATING
Peer: 216.x.x.x port 500 fvrf: (none) ivrf: (none)
      Desc: (none)
      Phase1_id: (none)
  IKEv1 SA: local 207.x.x.x/500 remote 216.x.x.x/500 Inactive
          Capabilities:(none) connid:0 lifetime:0
  IPSEC FLOW: permit ip 192.168.4.64/255.255.255.224 10.0.0.0/255.0.0.0
        Active SAs: 0, origin: crypto map
        Inbound:  #pkts dec'ed 192287 drop 0 life (KB/Sec) 0/0
        Outbound: #pkts enc'ed 219073 drop 169 life (KB/Sec) 0/0
  IPSEC FLOW: permit ip 192.168.4.64/255.255.255.224 192.168.4.0/255.255.2
        Active SAs: 0, origin: crypto map
        Inbound:  #pkts dec'ed 783827 drop 0 life (KB/Sec) 0/0
        Outbound: #pkts enc'ed 805580 drop 35 life (KB/Sec) 0/0
  IPSEC FLOW: permit ip 192.168.4.64/255.255.255.224 192.168.10.0/255.255.
        Active SAs: 0, origin: crypto map
        Inbound:  #pkts dec'ed 0 drop 0 life (KB/Sec) 0/0
        Outbound: #pkts enc'ed 661 drop 0 life (KB/Sec) 0/0
  IPSEC FLOW: permit ip 192.168.4.64/255.255.255.224 192.168.25.0/255.255.255.0
        Active SAs: 0, origin: crypto map
        Inbound:  #pkts dec'ed 0 drop 0 life (KB/Sec) 0/0
        Outbound: #pkts enc'ed 0 drop 0 life (KB/Sec) 0/0
  IPSEC FLOW: permit ip 192.168.4.64/255.255.255.224 172.16.4.0/255.255.255.0
        Active SAs: 0, origin: crypto map
        Inbound:  #pkts dec'ed 260258 drop 0 life (KB/Sec) 0/0
        Outbound: #pkts enc'ed 280760 drop 4 life (KB/Sec) 0/0


Site1-1921#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
216.x.x.x   207.x.x.x   MM_NO_STATE          0 ACTIVE
216.x.x.x   207.x.x.x   MM_NO_STATE          0 ACTIVE (deleted)

IPv6 Crypto ISAKMP SA


ASA# show crypto isakmp sa

   Active SA: 2
    Rekey SA: 0 (A tunnel will report 1 Active and 1 Rekey SA during rekey)
Total IKE SA: 2

1   IKE Peer: 66.x.x.x
    Type    : L2L             Role    : initiator
    Rekey   : no              State   : MM_ACTIVE
2   IKE Peer: 207.x.x.x
    Type    : user            Role    : initiator
    Rekey   : no              State   : MM_WAIT_MSG2

207.x.x.x is the remote peer with problems. 66.x.x.x is the stable remote peer. 216.x.x.x is the ASA.

On the show crypto isakmp sa results, I've seen the down Peer stuck in MM_WAIT_MSG3 as well during these incidents, it's not always MSG2.

The router has access to the public internet. That's how I'm getting into it and I'm also able to ping out. Also, like I said, reloading brings everything up no problem. Sometimes the router sits like this for hours, so it's not like the T1s are just coming back up completely during the reload. As far as I can tell, the routes through the public internet between the peers are all good and there is nothing blocking communication. There doesn't seem to be any particular pattern to the failures. Sometimes they're late at night on weekdays, sometimes during the workday and some times on weekends. I can't even tell if the SA is being torn down legitimately because of IPSEC lifetime limits or if something else like an upstream outage is causing the tunnel to be rebuilt.

Here's the crypto config on the ASA:

crypto ipsec transform-set CSM_TS_1 esp-3des esp-md5-hmac
crypto ipsec transform-set ESP-AES-256-SHA esp-aes-256 esp-sha-hmac
crypto ipsec security-association lifetime seconds 28800
crypto ipsec security-association lifetime kilobytes 4608000
crypto dynamic-map CSM_outside_map_dynamic 2 set transform-set CSM_TS_1
crypto dynamic-map CSM_outside_map_dynamic 2 set reverse-route
crypto map CSM_outside_map 10 match address SITE1
crypto map CSM_outside_map 10 set peer 207.x.x.x
crypto map CSM_outside_map 10 set transform-set ESP-AES-256-SHA
crypto map CSM_outside_map 15 match address SITE2
crypto map CSM_outside_map 15 set peer 66.x.x.x
crypto map CSM_outside_map 15 set transform-set ESP-AES-256-SHA
crypto map CSM_outside_map 30001 ipsec-isakmp dynamic CSM_outside_map_dynamic
crypto map CSM_outside_map interface outside
crypto isakmp enable outside
crypto isakmp policy 10
   authentication pre-share
   encryption aes-256
   hash sha
   group 2
   lifetime 43200
   telnet timeout 5
tunnel-group 207.x.x.x type ipsec-l2l
tunnel-group 207.x.x.x ipsec-attributes
   pre-shared-key *****
tunnel-group 66.x.x.x type ipsec-l2l
tunnel-group 66.x.x.x ipsec-attributes
   pre-shared-key *****

The affected site:

crypto isakmp policy 10
 encr aes 256
 authentication pre-share
 group 2
 lifetime 43200
crypto isakmp key ***** address 216.x.x.x   no-xauth
crypto isakmp keepalive 20 5
!
!
crypto ipsec transform-set ESP-AES-256-SHA esp-aes 256 esp-sha-hmac
 mode tunnel
!         
!
!
crypto map VPN 10 ipsec-isakmp
 description Tunnel to 216.x.x.x
 set peer 216.x.x.x
 set transform-set ESP-AES-256-SHA
 match address VPN_TUNNEL

Thanks for any help or suggestions you can provide.

nkarthikeyan · ‎08-08-2014

Hi,

I do not see any problem with your configuration. But can you try removing DPD configuration at both the ends?

In router end:

no crypto isakmp keepalive 20 5

In ASA end:

tunnel-group x.x.x.x type ipsec-l2l
tunnel-group x.x.x.x ipsec-attributes

isakmp keepalive disable

!

Regards

Karthik

administrator125 · ‎08-08-2014

Disabled DPD at both ends of the troubled tunnel. I'll post back with an update the next time there's a failure.

administrator125 · ‎08-22-2014

So we finally got another failure this morning. Here's some debug output I was able to grab:

PAChamberBusiness417WalnutASA# debug crypto isakmp 5
PAChamberBusiness417WalnutASA# Aug 22 09:01:32 [IKEv1]: IP = 207.x.x.x, Duplicate Phase 1 packet detected.  Retransmitting last packet.
Aug 22 09:01:32 [IKEv1]: IP = 207.x.x.x, P1 Retransmit msg dispatched to MM FSM
Aug 22 09:01:42 [IKEv1]: IP = 207.x.x.x, Duplicate Phase 1 packet detected.  Retransmitting last packet.
Aug 22 09:01:42 [IKEv1]: IP = 207.x.x.x, P1 Retransmit msg dispatched to MM FSM
Aug 22 09:01:42 [IKEv1 DEBUG]: IP = 207.x.x.x, IKE MM Responder FSM error history (struct &0xca8fa388)  <state>, <event>:  MM_DONE, EV_ERROR-->MM_WAIT_MSG3, EV_RESEND_MSG-->MM_WAIT_MSG3, NullEvent-->MM_SND_MSG2, EV_SND_MSG-->MM_SND_MSG2, EV_START_TMR-->MM_SND_MSG2, EV_RESEND_MSG-->MM_WAIT_MSG3, EV_TIMEOUT-->MM_WAIT_MSG3, NullEvent
Aug 22 09:01:52 [IKEv1 DEBUG]: IP = 207.x.x.x, Oakley proposal is acceptable
Aug 22 09:01:52 [IKEv1 DEBUG]: IP = 207.x.x.x, IKE SA Proposal # 1, Transform # 1 acceptable  Matches global IKE entry # 1
Aug 22 09:02:02 [IKEv1]: IP = 207.x.x.x, Duplicate Phase 1 packet detected.  Retransmitting last packet.
Aug 22 09:02:02 [IKEv1]: IP = 207.x.x.x, P1 Retransmit msg dispatched to MM FSM

Sorry about the debut level. I meant to do 254 but the ssh session was lagging as I typed and I wound up butchering it.

The circuit the troubled site is on is a basic T1 that was converted back in June from MPLS to a simple DIA circuit (which is what started this since we had to then provide our own L2L with the loss of the MPLS mesh).

Here's another fun fact: there were some storms in the area last night, so outage of provider equipment is a possibility.

Is it possible that this sort of error might be caused by NAT or PAT by the ISP that I'm not aware of? I'm trying to get hold of an engineer at the ISP to see if that's happening, but I'm curious if it could be the issue, since it seems like the ASA and the router stop talking to each other when it comes to reopening a dead tunnel.

nkarthikeyan · ‎08-22-2014

Hi,

This kind of error comes when you have the pre-shared key mismatch or negotiation for pre-shared key fails..... can you do one thing..... can you give the pre-shared key @ both ends once again and save the configuration and see if that happens once again.....

Also during the outage if NAT/PAT failure happened in between also would have caused the problem....

Regards

Karthik

administrator125 · ‎08-22-2014

I reentered the PSK at both endpoints, so now we play the waiting game again.

Just to clarify, are you agreeing that PAT being done by the ISP is a potential cause worth investigating? If that were happening, would there be anything I could even do on my equipment to work around it?

Thanks,

Chris

L2L VPN Fails to reestablish